Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MEFISTO with Poisson likelihood #2

Open
willtownes opened this issue Jun 22, 2021 · 4 comments
Open

MEFISTO with Poisson likelihood #2

willtownes opened this issue Jun 22, 2021 · 4 comments

Comments

@willtownes
Copy link

Hi, congratulations on this awesome method! I am interested in trying MEFISTO with the Poisson likelihood. I have been following the tutorial for the Visium brain data, but it seems to run into numerical problems after the first few iterations. Here is the code I have been using:

ent = entry_point()
ent.set_data_options(use_float32=True)
ad.raw = ad
ent.set_data_from_anndata(ad, use_raw=True, likelihoods="poisson")
ent.set_model_options(factors=4)
ent.set_train_options(iter=ne)
ent.set_covariates([ad.obsm["spatial"]], covariates_names=["imagerow", "imagecol"])
ent.set_smooth_options(sparseGP=True, frac_inducing=M/ad.n_obs,
                       start_opt=10, opt_freq=10)
ent.build()
%time ent.run()

At iteration 12 the ELBO becomes nan and after iteration 19 it says "Optimising sigma node..." then raises an exception:
UnboundLocalError: local variable 'best_lidx' referenced before assignment

@bv2
Copy link
Contributor

bv2 commented Jun 28, 2021

Hi @willtownes,

thanks for your interest in the method.

In general, we recommend in most cases to use the Gaussian likelihood in combination with a suitable preprocessing (see also some guidelines/recommendations here) to take data characteristics into account while providing a good tradeoff in terms of scalability and performance. We added a small-scale simulation example for a simple illustration of the Poisson likelihood here.

The numerical problems that result in the nan-values seem to be an issue in the underlying MOFA model on this data set. We will take a look at this and let you know once it is fixed. Thanks for reporting the bug!

@bv2
Copy link
Contributor

bv2 commented Jul 2, 2021

Hi @willtownes,

just as quick update: We fixed the numerical issues which you encountered on the Poisson likelihood. If you install mofapy2 from the dev branch (pip install git+https://github.com/bioFAM/mofapy2@dev) the error above should be fixed. We will merge this in the coming versions with the master branch and PyPI. However, as mentioned above, Gaussian likelihood + a suitable pre-processing might still be a better choice for the spatial transcriptomics data.

@willtownes
Copy link
Author

OK I have tested this and while it no longer has the numerical divergence error early in training, it seems to have some weird behavior and never converged. Below is a plot of the ELBO with the horizontal axis representing the number of epochs. I'm not sure why the ELBO periodically drops precipitously.
image

@bv2
Copy link
Contributor

bv2 commented Jul 9, 2021

Hi Will,
this looks strange, we will have a look. It seems to be specific to the combination of sparse GPs with a Poisson likelihood. For now, we'd recommend to use either a Gaussian likelihood or a Poisson likelihood in conjunction with a full GP model (setting sparseGP = False). I will update here once we have fixed the problem above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants