Add Full Rank Approximation #289

Sayam753 · 2020-06-29T15:33:36Z

This PR adds Full Rank ADVI interface. It is ready to review. Just to experiment with the API, I have created a gist inspired from @ColCarroll's notebook. I look forward to include mixture comparisons as well.

codecov · 2020-06-29T15:48:33Z

Codecov Report

Merging #289 into master will increase coverage by 0.52%.
The diff coverage is 93.27%.

@@            Coverage Diff             @@
##           master     #289      +/-   ##
==========================================
+ Coverage   89.50%   90.02%   +0.52%     
==========================================
  Files          32       33       +1     
  Lines        2400     2447      +47     
==========================================
+ Hits         2148     2203      +55     
+ Misses        252      244       -8

Impacted Files	Coverage Δ
pymc4/flow/transformed_executor.py	`92.85% <ø> (ø)`
pymc4/distributions/distribution.py	`88.16% <77.77%> (-0.66%)`	⬇️
pymc4/distributions/continuous.py	`98.28% <80.00%> (+0.03%)`	⬆️
pymc4/distributions/discrete.py	`98.05% <91.66%> (+0.09%)`	⬆️
pymc4/distributions/transforms.py	`78.26% <92.59%> (+2.39%)`	⬆️
pymc4/variational/util.py	`93.33% <93.33%> (ø)`
pymc4/variational/approximations.py	`91.20% <100.00%> (+10.62%)`	⬆️
pymc4/flow/meta_executor.py	`87.50% <0.00%> (+2.50%)`	⬆️

fonnesbeck · 2020-07-01T17:22:43Z

Just testing this now. Not sure what to make of this:

Should be 10K loss values, correct?

Sayam753 · 2020-07-01T17:28:29Z

Hi @fonnesbeck
Thanks for checking this out. The implementation of Full Rank ADVI(especially the model flattening part) is wrong. I am working on this! Soon, this will be resolved.

fonnesbeck · 2020-07-01T19:17:49Z

Mean field ADVI does not run at all. Fails with:

ValueError: Dimensions must be equal, but are 4 and 1086 for '{{node monte_carlo_variational_loss/expectation/JointDistributionSequential/log_prob/add_1}} = AddV2[T=DT_FLOAT](monte_carlo_variational_loss/expectation/JointDistributionSequential/log_prob/add, monte_carlo_variational_loss/expectation/JointDistributionSequential/log_prob/Normal_1/log_prob/sub)' with input shapes: [1,4], [1,1086].

Sayam753 · 2020-07-01T19:41:31Z

Seems like a shape issue. Can you share a reproducible code snippet?
I suspect sort of adjusting sample_size parameter to fit function.

pymc4/variational/util.py

pymc4/variational/approximations.py

tests/test_variational.py

fonnesbeck · 2020-07-04T18:18:20Z

Is this supposed to be working now? I'm getting the following:

It was running with the original PR, though not getting good results. Now it quits after a handful of iterations with NaNs.

ferrine · 2020-07-04T18:21:10Z

There are some concerns about transformations not being stable

fonnesbeck · 2020-07-04T18:22:22Z

Same behavior for mean-field ADVI

Sayam753 · 2020-07-07T08:22:35Z

Yes. Some transformations were missing for Bounded Distributions. Highlighting the same, I had opened an issue #283 some time back. Commit d61431b adds the remaining transformations. Before this commit, passing validate_args=True to each distribution while doing VI, leads to various constraint errors from TFP.

But I do not feel like its a good approach to manually add transformations to respective distributions. I tried to make it more general with a single class -

class Interval(BackwardTransform):
    name = "interval"

    def __init__(self, lower_limit, upper_limit):
        transform = tf.cond(
            tf.math.is_inf(lower_limit),
            lambda: tf.cond(
                tf.math.is_inf(upper_limit),
                lambda: tfb.Identity(),
                lambda: tfb.Chain([tfb.Shift(upper_limit), tfb.Scale(-1), tfb.Exp()]),  # upper - exp(x)
            ),
            lambda: tf.cond(
                tf.math.is_inf(upper_limit),
                lambda: tfb.Chain([tfb.Shift(lower_limit), tfb.Exp()]),  # exp(x) + lower
                lambda: tfb.Sigmoid(low=lower_limit, high=upper_limit),  # interval
            ),
        )
        super().__init__(transform)

But this leads to many eager execution issues for lambda functions. Please suggest if there is a way to improve existing approach.

ferrine · 2020-07-07T10:30:50Z

@fonnesbeck, what is park bias model?

fonnesbeck · 2020-07-07T12:38:58Z

@ferrine its a relatively large model that I was using to test ADVI, ported over from PyMC3. Its got a hierarchical component as well as Gaussian proccesses. Samples very slowly, so a good candidate for ADVI.

Sayam753 · 2020-07-07T13:51:59Z

Hi @fonnesbeck
The model seems interesting. Can you experiment again in your free time with the current state of PR? Issues related to transformations has been resolved.
Thanks

fonnesbeck · 2020-07-07T17:40:01Z

With the current PR, I am getting the following failure:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-29-6a1ed8631a36> in <module>()
      1 m = park_bias_model()
      2 # trace = pm.sample(m, num_samples=100, burn_in=2000, xla=True)
----> 3 approx = pm.fit(m, num_steps=20000, method='fullrank_advi')

6 frames
/usr/local/lib/python3.6/dist-packages/pymc4/variational/approximations.py in fit(model, method, num_steps, sample_size, random_seed, optimizer, **kwargs)
    246         return losses
    247 
--> 248     return ADVIFit(inference, run_approximation())

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    778       else:
    779         compiler = "nonXla"
--> 780         result = self._call(*args, **kwds)
    781 
    782       new_tracing_count = self._get_tracing_count()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    844               *args, **kwds)
    845       # If we did not create any variables the trace we have is good enough.
--> 846       return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
    847 
    848     def fn_with_cond(*inner_args, **inner_kwds):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _filtered_call(self, args, kwargs, cancellation_manager)
   1845                            resource_variable_ops.BaseResourceVariable))],
   1846         captured_inputs=self.captured_inputs,
-> 1847         cancellation_manager=cancellation_manager)
   1848 
   1849   def _call_flat(self, args, captured_inputs, cancellation_manager=None):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1921       # No tape is watching; skip to running the function.
   1922       return self._build_call_outputs(self._inference_function.call(
-> 1923           ctx, args, cancellation_manager=cancellation_manager))
   1924     forward_backward = self._select_forward_and_backward_functions(
   1925         args,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    548               inputs=args,
    549               attrs=attrs,
--> 550               ctx=ctx)
    551         else:
    552           outputs = execute.execute_with_cancellation(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Cholesky decomposition was not successful. The input might not be valid.
	 [[{{node fit_surrogate_posterior/while/body/_22/fit_surrogate_posterior/while/StatefulPartitionedCall/monte_carlo_variational_loss/expectation/loop_body/PartitionedCall/pfor/PartitionedCall/Cholesky/pfor/Cholesky}}]]
	 [[fit_surrogate_posterior/while/body/_22/fit_surrogate_posterior/while/StatefulPartitionedCall/Adam/Adam/AssignAddVariableOp/_51]]
  (1) Invalid argument:  Cholesky decomposition was not successful. The input might not be valid.
	 [[{{node fit_surrogate_posterior/while/body/_22/fit_surrogate_posterior/while/StatefulPartitionedCall/monte_carlo_variational_loss/expectation/loop_body/PartitionedCall/pfor/PartitionedCall/Cholesky/pfor/Cholesky}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_run_approximation_207085]

Function call stack:
run_approximation -> run_approximation

fonnesbeck · 2020-07-07T19:57:53Z

Can you rebase to current master so that all the covariance functions are available?

pymc4/variational/approximations.py

Completed FullRank VI interface

Changed variable flattening to model flattening

Sayam753 · 2020-07-27T07:42:17Z

pymc4/variational/approximations.py

+        dtype = dtype_util.common_dtype(
+            self.state.all_unobserved_values.values(), dtype_hint=tf.float64
+        )


Figuring out reasons of Cholesky Decomposition failure, this comment has been very helpful. Here are my few observations experimenting with VI and Gaussian Processes -

High learning rates (order of 1e-2) leads to cholesky decomposition errors.

Both Mean Field and Full Rank ADVI also lead to decomposition error with tf.float32 even if we add a small WhiteNoise to the diagonal. But testing ADVI with tf.float64 leads to greater numerical stability of covariance matrix. So, I changed the dtype_hint to use tf.float64.

ferrine

Looks good. We had a long conversation figuring out the source of instabilities of park_bias model and it looks like smth is wrong with Cholesky implementation in float32 mode. We double-checked possible sources of numerical instabilities in the algorithm itself and failed to spot an obvious error. I'd rather leave this investigation on an ongoing basis. I hope most of the models do not suffer from these problems or we can spot the source of the problem later

ferrine · 2020-07-27T17:31:51Z

and I also leave this experiment by @Sayam753

twiecki · 2020-07-27T19:13:40Z

This is a major feature, congrats @Sayam753!

@ferrine should we open an issue to keep track of the issue?

twiecki · 2020-07-27T19:15:52Z

In another PR I would add full rank to the example NB, which also still reads "Full Rank ADVI - Coming Soon".

Sayam753 changed the title ~~Add Full Rank Approximation~~ WIP: Add Full Rank Approximation Jul 2, 2020

ferrine reviewed Jul 2, 2020

View reviewed changes

pymc4/variational/util.py Outdated Show resolved Hide resolved

ferrine reviewed Jul 2, 2020

View reviewed changes

pymc4/variational/approximations.py Outdated Show resolved Hide resolved

ferrine reviewed Jul 2, 2020

View reviewed changes

tests/test_variational.py Show resolved Hide resolved

Sayam753 force-pushed the full_rank branch from d61431b to 2aa6347 Compare July 8, 2020 02:53

ferrine reviewed Jul 17, 2020

View reviewed changes

pymc4/variational/approximations.py Show resolved Hide resolved

Sayam753 changed the title ~~WIP: Add Full Rank Approximation~~ Add Full Rank Approximation Jul 27, 2020

Sayam753 added 9 commits July 27, 2020 13:00

Added tests for FullRank ADVI

2728643

Completed FullRank VI interface

Changed full rank callable string

00b8611

Changed initialization of Mean Field and Full Rank ADVI

9c5dcac

Changed variable flattening to model flattening

Resolved issues with transformations

4291057

Resolved issues with cholesky decomposition failure

ff4875c

Updated quickstart guide with respect to flattened shape

d195daa

Small change to trigger pipelines

9ee1003

Added deterministics callback

800f9df

Added deterministic_callbacks tests

109eda8

Changed dtype_hint to tf.float64 for numerical stability

5137396

Sayam753 force-pushed the full_rank branch from 6189ba1 to 5137396 Compare July 27, 2020 07:32

Sayam753 commented Jul 27, 2020

View reviewed changes

Sayam753 requested a review from ferrine July 27, 2020 07:44

ferrine approved these changes Jul 27, 2020

View reviewed changes

ferrine merged commit 4097b8a into pymc-devs:master Jul 27, 2020

Sayam753 deleted the full_rank branch July 28, 2020 04:16

Sayam753 mentioned this pull request Jul 28, 2020

Numerical instabilities with tf.float32 for cholesky decomposition #307

Open

rrkarim mentioned this pull request Aug 22, 2020

Compound/Gibbs step and discrete suppot #306

Merged

7 tasks

Sayam753 mentioned this pull request Aug 23, 2020

Remove transformations for Discrete distributions #314

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Full Rank Approximation #289

Add Full Rank Approximation #289

Sayam753 commented Jun 29, 2020 •

edited

Loading

codecov bot commented Jun 29, 2020 •

edited

Loading

fonnesbeck commented Jul 1, 2020

Sayam753 commented Jul 1, 2020

fonnesbeck commented Jul 1, 2020

Sayam753 commented Jul 1, 2020

fonnesbeck commented Jul 4, 2020

ferrine commented Jul 4, 2020

fonnesbeck commented Jul 4, 2020

Sayam753 commented Jul 7, 2020

ferrine commented Jul 7, 2020

fonnesbeck commented Jul 7, 2020

Sayam753 commented Jul 7, 2020

fonnesbeck commented Jul 7, 2020 •

edited

Loading

fonnesbeck commented Jul 7, 2020

Sayam753 Jul 27, 2020

ferrine left a comment

ferrine commented Jul 27, 2020

twiecki commented Jul 27, 2020

twiecki commented Jul 27, 2020

Add Full Rank Approximation #289

Add Full Rank Approximation #289

Conversation

Sayam753 commented Jun 29, 2020 • edited Loading

codecov bot commented Jun 29, 2020 • edited Loading

Codecov Report

fonnesbeck commented Jul 1, 2020

Sayam753 commented Jul 1, 2020

fonnesbeck commented Jul 1, 2020

Sayam753 commented Jul 1, 2020

fonnesbeck commented Jul 4, 2020

ferrine commented Jul 4, 2020

fonnesbeck commented Jul 4, 2020

Sayam753 commented Jul 7, 2020

ferrine commented Jul 7, 2020

fonnesbeck commented Jul 7, 2020

Sayam753 commented Jul 7, 2020

fonnesbeck commented Jul 7, 2020 • edited Loading

fonnesbeck commented Jul 7, 2020

Sayam753 Jul 27, 2020

Choose a reason for hiding this comment

ferrine left a comment

Choose a reason for hiding this comment

ferrine commented Jul 27, 2020

twiecki commented Jul 27, 2020

twiecki commented Jul 27, 2020

Sayam753 commented Jun 29, 2020 •

edited

Loading

codecov bot commented Jun 29, 2020 •

edited

Loading

fonnesbeck commented Jul 7, 2020 •

edited

Loading