Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Full Rank Approximation #289

Merged
merged 10 commits into from
Jul 27, 2020
Merged

Add Full Rank Approximation #289

merged 10 commits into from
Jul 27, 2020

Conversation

Sayam753
Copy link
Member

@Sayam753 Sayam753 commented Jun 29, 2020

This PR adds Full Rank ADVI interface. It is ready to review. Just to experiment with the API, I have created a gist inspired from @ColCarroll's notebook. I look forward to include mixture comparisons as well.

  • Add Full Rank ADVI
  • Add docstrings
  • Add tests
  • Change variable flattening to model flattening
  • Add Interval transformation
  • Rerun quick start notebook
  • Add deterministic callbacks
  • Add tests for deterministic callbacks

@codecov
Copy link

codecov bot commented Jun 29, 2020

Codecov Report

Merging #289 into master will increase coverage by 0.52%.
The diff coverage is 93.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #289      +/-   ##
==========================================
+ Coverage   89.50%   90.02%   +0.52%     
==========================================
  Files          32       33       +1     
  Lines        2400     2447      +47     
==========================================
+ Hits         2148     2203      +55     
+ Misses        252      244       -8     
Impacted Files Coverage Δ
pymc4/flow/transformed_executor.py 92.85% <ø> (ø)
pymc4/distributions/distribution.py 88.16% <77.77%> (-0.66%) ⬇️
pymc4/distributions/continuous.py 98.28% <80.00%> (+0.03%) ⬆️
pymc4/distributions/discrete.py 98.05% <91.66%> (+0.09%) ⬆️
pymc4/distributions/transforms.py 78.26% <92.59%> (+2.39%) ⬆️
pymc4/variational/util.py 93.33% <93.33%> (ø)
pymc4/variational/approximations.py 91.20% <100.00%> (+10.62%) ⬆️
pymc4/flow/meta_executor.py 87.50% <0.00%> (+2.50%) ⬆️

@fonnesbeck
Copy link
Member

Just testing this now. Not sure what to make of this:

image

Should be 10K loss values, correct?

@Sayam753
Copy link
Member Author

Sayam753 commented Jul 1, 2020

Hi @fonnesbeck
Thanks for checking this out. The implementation of Full Rank ADVI(especially the model flattening part) is wrong. I am working on this! Soon, this will be resolved.

@fonnesbeck
Copy link
Member

Mean field ADVI does not run at all. Fails with:

ValueError: Dimensions must be equal, but are 4 and 1086 for '{{node monte_carlo_variational_loss/expectation/JointDistributionSequential/log_prob/add_1}} = AddV2[T=DT_FLOAT](monte_carlo_variational_loss/expectation/JointDistributionSequential/log_prob/add, monte_carlo_variational_loss/expectation/JointDistributionSequential/log_prob/Normal_1/log_prob/sub)' with input shapes: [1,4], [1,1086].

@Sayam753
Copy link
Member Author

Sayam753 commented Jul 1, 2020

Seems like a shape issue. Can you share a reproducible code snippet?
I suspect sort of adjusting sample_size parameter to fit function.

@Sayam753 Sayam753 changed the title Add Full Rank Approximation WIP: Add Full Rank Approximation Jul 2, 2020
pymc4/variational/util.py Outdated Show resolved Hide resolved
@fonnesbeck
Copy link
Member

Is this supposed to be working now? I'm getting the following:

image

It was running with the original PR, though not getting good results. Now it quits after a handful of iterations with NaNs.

@ferrine
Copy link
Member

ferrine commented Jul 4, 2020

There are some concerns about transformations not being stable

@fonnesbeck
Copy link
Member

Same behavior for mean-field ADVI

@Sayam753
Copy link
Member Author

Sayam753 commented Jul 7, 2020

Yes. Some transformations were missing for Bounded Distributions. Highlighting the same, I had opened an issue #283 some time back. Commit d61431b adds the remaining transformations. Before this commit, passing validate_args=True to each distribution while doing VI, leads to various constraint errors from TFP.

But I do not feel like its a good approach to manually add transformations to respective distributions. I tried to make it more general with a single class -

class Interval(BackwardTransform):
    name = "interval"

    def __init__(self, lower_limit, upper_limit):
        transform = tf.cond(
            tf.math.is_inf(lower_limit),
            lambda: tf.cond(
                tf.math.is_inf(upper_limit),
                lambda: tfb.Identity(),
                lambda: tfb.Chain([tfb.Shift(upper_limit), tfb.Scale(-1), tfb.Exp()]),  # upper - exp(x)
            ),
            lambda: tf.cond(
                tf.math.is_inf(upper_limit),
                lambda: tfb.Chain([tfb.Shift(lower_limit), tfb.Exp()]),  # exp(x) + lower
                lambda: tfb.Sigmoid(low=lower_limit, high=upper_limit),  # interval
            ),
        )
        super().__init__(transform)

But this leads to many eager execution issues for lambda functions. Please suggest if there is a way to improve existing approach.

@ferrine
Copy link
Member

ferrine commented Jul 7, 2020

@fonnesbeck, what is park bias model?

@fonnesbeck
Copy link
Member

@ferrine its a relatively large model that I was using to test ADVI, ported over from PyMC3. Its got a hierarchical component as well as Gaussian proccesses. Samples very slowly, so a good candidate for ADVI.

@Sayam753
Copy link
Member Author

Sayam753 commented Jul 7, 2020

Hi @fonnesbeck
The model seems interesting. Can you experiment again in your free time with the current state of PR? Issues related to transformations has been resolved.
Thanks

@fonnesbeck
Copy link
Member

fonnesbeck commented Jul 7, 2020

With the current PR, I am getting the following failure:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-29-6a1ed8631a36> in <module>()
      1 m = park_bias_model()
      2 # trace = pm.sample(m, num_samples=100, burn_in=2000, xla=True)
----> 3 approx = pm.fit(m, num_steps=20000, method='fullrank_advi')

6 frames
/usr/local/lib/python3.6/dist-packages/pymc4/variational/approximations.py in fit(model, method, num_steps, sample_size, random_seed, optimizer, **kwargs)
    246         return losses
    247 
--> 248     return ADVIFit(inference, run_approximation())

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    778       else:
    779         compiler = "nonXla"
--> 780         result = self._call(*args, **kwds)
    781 
    782       new_tracing_count = self._get_tracing_count()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    844               *args, **kwds)
    845       # If we did not create any variables the trace we have is good enough.
--> 846       return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
    847 
    848     def fn_with_cond(*inner_args, **inner_kwds):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _filtered_call(self, args, kwargs, cancellation_manager)
   1845                            resource_variable_ops.BaseResourceVariable))],
   1846         captured_inputs=self.captured_inputs,
-> 1847         cancellation_manager=cancellation_manager)
   1848 
   1849   def _call_flat(self, args, captured_inputs, cancellation_manager=None):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1921       # No tape is watching; skip to running the function.
   1922       return self._build_call_outputs(self._inference_function.call(
-> 1923           ctx, args, cancellation_manager=cancellation_manager))
   1924     forward_backward = self._select_forward_and_backward_functions(
   1925         args,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    548               inputs=args,
    549               attrs=attrs,
--> 550               ctx=ctx)
    551         else:
    552           outputs = execute.execute_with_cancellation(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Cholesky decomposition was not successful. The input might not be valid.
	 [[{{node fit_surrogate_posterior/while/body/_22/fit_surrogate_posterior/while/StatefulPartitionedCall/monte_carlo_variational_loss/expectation/loop_body/PartitionedCall/pfor/PartitionedCall/Cholesky/pfor/Cholesky}}]]
	 [[fit_surrogate_posterior/while/body/_22/fit_surrogate_posterior/while/StatefulPartitionedCall/Adam/Adam/AssignAddVariableOp/_51]]
  (1) Invalid argument:  Cholesky decomposition was not successful. The input might not be valid.
	 [[{{node fit_surrogate_posterior/while/body/_22/fit_surrogate_posterior/while/StatefulPartitionedCall/monte_carlo_variational_loss/expectation/loop_body/PartitionedCall/pfor/PartitionedCall/Cholesky/pfor/Cholesky}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_run_approximation_207085]

Function call stack:
run_approximation -> run_approximation

@fonnesbeck
Copy link
Member

Can you rebase to current master so that all the covariance functions are available?

@Sayam753 Sayam753 changed the title WIP: Add Full Rank Approximation Add Full Rank Approximation Jul 27, 2020
Comment on lines +136 to +138
dtype = dtype_util.common_dtype(
self.state.all_unobserved_values.values(), dtype_hint=tf.float64
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figuring out reasons of Cholesky Decomposition failure, this comment has been very helpful. Here are my few observations experimenting with VI and Gaussian Processes -

  1. High learning rates (order of 1e-2) leads to cholesky decomposition errors.
  2. Both Mean Field and Full Rank ADVI also lead to decomposition error with tf.float32 even if we add a small WhiteNoise to the diagonal. But testing ADVI with tf.float64 leads to greater numerical stability of covariance matrix. So, I changed the dtype_hint to use tf.float64.

@Sayam753 Sayam753 requested a review from ferrine July 27, 2020 07:44
Copy link
Member

@ferrine ferrine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We had a long conversation figuring out the source of instabilities of park_bias model and it looks like smth is wrong with Cholesky implementation in float32 mode. We double-checked possible sources of numerical instabilities in the algorithm itself and failed to spot an obvious error. I'd rather leave this investigation on an ongoing basis. I hope most of the models do not suffer from these problems or we can spot the source of the problem later

@ferrine
Copy link
Member

ferrine commented Jul 27, 2020

and I also leave this experiment by @Sayam753
image

@ferrine ferrine merged commit 4097b8a into pymc-devs:master Jul 27, 2020
@twiecki
Copy link
Member

twiecki commented Jul 27, 2020

This is a major feature, congrats @Sayam753!

@ferrine should we open an issue to keep track of the issue?

@twiecki
Copy link
Member

twiecki commented Jul 27, 2020

In another PR I would add full rank to the example NB, which also still reads "Full Rank ADVI - Coming Soon".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants