Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first draft of Bernstein polynomial flow #32

Closed
wants to merge 1 commit into from

Conversation

oduerr
Copy link
Contributor

@oduerr oduerr commented Dec 7, 2023

Hello,

I am the co-author of the Bernstein-Flows paper https://arxiv.org/abs/2004.00464 in which we constructed a one-dimensional flow based on Bernstein polynomials. We already have a TF implementation.
This flow would fit nicely in the zuko framework as an alternative to NSF. I am not an expert in software development and pytorch. I managed to write a transformation and also a flow. The "test_transform" works and I have usd the flow successfully for training, but I still have an exception in the unit test (test_flows) at the following position

for p in flow.parameters():
    assert p.grad is not None

I tried hard but could not find any reason why this was failing. Possibly I lack the knowledge of the bells and whistles of your package. Could you please consider the request?

Best,
Oliver and Marcel (@MArpogaus)

@francois-rozet
Copy link
Member

francois-rozet commented Dec 7, 2023

Hello @oduerr, thank your for your PR and contributing! I took a (very) quick look and could not find where the bug come from, especially if you were able to train successfully. This part of the test is to ensure that all parameters are used and registered within the computation graph. I'll have to dig deeper, but I'll be flying to NeurIPS tomorrow, so I might not be able to before the end of the conference. Thanks again and sorry for the wait.

@oduerr
Copy link
Contributor Author

oduerr commented Dec 12, 2023

Hello @oduerr, thank your for your PR and contributing! I took a (very) quick look and could not find where the bug come from, especially if you were able to train successfully. This part of the test is to ensure that all parameters are used and registered within the computation graph. I'll have to dig deeper, but I'll be flying to NeurIPS tomorrow, so I might not be able to before the end of the conference. Thanks again and sorry for the wait.

Hello @francois-rozet,

thank you very much for your fast reply (and sorry for my late answer). Concerning training, I made a small notebook https://github.com/oduerr/zuko/blob/bernstein/oliver_tester/bern_tester.ipynb demonstrating the successful training for an unconditional 1-D distribution.

A side remark concerning the training: if I understand the framework correctly, it is built so that during training, the inverse is used (Flow goes from "Latent to Data"). Is it possible to specify the flow in a way that in the training no inverse is used (Flow goes from "Data to Latent")? It would be great if one could choose the direction and trade-off training vs. sampling.

No pressure with your reply; enjoy NeurIPS first!

Best,
Oliver

@MArpogaus
Copy link
Contributor

Hi @francois-rozet,

Thank you for your prompt response.
Is there anything we can do to assist in resolving the issues and merge this PR?

Best regards,
Marcel

@MArpogaus
Copy link
Contributor

I was just wondering: Could this class be used to train using the forward transformation, as @oduerr suggests?

The relevant part in my tfp Implementation can be found here.

@francois-rozet
Copy link
Member

francois-rozet commented Jan 12, 2024

Hello @oduerr and @MArpogaus, I will take a look at this now.

A side remark concerning the training: if I understand the framework correctly, it is built so that during training, the inverse is used (Flow goes from "Latent to Data").

So it is actually the opposite. In Zuko, the forward is assumed to be "data to latent". This is because the inverse of some transformations is not differentiable.

It would be great if one could choose the direction and trade-off training vs. sampling.

It is already the case actually, and it is indeed related to LazyInverse @MArpogaus! Basically, if you have a LazyTransform, you can reverse it with .inv.

For example, in the training from energy tutorial, I use .inv to reverse the transformation of an auto-regressive NSF and then train by sampling (which is now fast).

@francois-rozet
Copy link
Member

@oduerr I have an issue. I cannot pull your PR on my computer because of the branch name which contains a emoji 😆 Would it be possible to rename your branch (without the emoji) and re-submit a PR if necessary?

bound = 10,
**kwargs,
):
super().__init__(self.f, phi=theta_un, bound=bound, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the bug 🐛 phi should be a list of parameters (in this case it should be (theta_un,). Otherwise I think everything works, good job! I will refactor the code a bit to make it compliant with Zuko's conventions.

@francois-rozet
Copy link
Member

francois-rozet commented Jan 12, 2024

Hello again @oduerr and @MArpogaus, as I was unable to pull this PR. I copied the code and created a branch (#33) on the Zuko repo. In addition to fixing the bug with phi, I have made some changes:

  1. As a polynomial flow, I moved the flow to the zuko.flows.polynomial module instead of zuko.flows.spline. I also renamed it BPF (Bernstein polynomial flow) to follow Zuko's conventions.
  2. I used torch.arange(1, degree + 1) instead of torch.tensor(range(1, degree)) to create $\alpha$ and $\beta$.
  3. I used the sigmoid function $\sigma(x) = \frac{1}{1 + \exp(-x)}$ to map the inputs to $[0, 1]$ instead of $\frac{x + B}{2B}$. This ensures that the domain is $\mathbb{R}$. In practice, it is only invertible for the domain $[-B, B]$ because these are the initial bounds of the bisection method, but it is still a bijection over $\mathbb{R}$.

Otherwise, its the same. Great job! Tell me if you see a mistake in #33 or want something to be changed.

@oduerr
Copy link
Contributor Author

oduerr commented Jan 14, 2024 via email

@francois-rozet
Copy link
Member

francois-rozet commented Jan 14, 2024

But we sometimes found it beneficial to linearly map $[-B,B]$ to $[0,1]$ (especially when the bounds are known). Would it be possible to have the sigmoid as a save default setting but also have the option to linearly map $[-B,B]$ to $[0,1]$?

Yes it is possible to add the option to switch to a linear map. However, because the output of the transformation is unbounded, stacking several Bernstein transformations would likely fail during training.

I like your proposition of linearly extrapolating outside of the bounds better. This is what spline transformations do actually. However, for splines it is trivial to compute the values and derivatives at the bounds. For your case, I don't see how to do it cheaply/easily.

I propose to add the option to switch to a linear map instead of a sigmoid (+ assert that inputs are in bounds) in #33 and merge the PR. Then you can submit a new PR (or an issue if you don't know how to tackle this either) that adds the linear extrapolation feature later on. WDYT?

I am impressed with the speed of zuko, we had no implementations as fast as zuko of our flows. Thanks!

Thanks!

@MArpogaus
Copy link
Contributor

MArpogaus commented Jan 16, 2024

I like your proposition of linearly extrapolating outside of the bounds better. This is what spline transformations do actually. However, for splines it is trivial to compute the values and derivatives at the bounds. For your case, I don't see how to do it cheaply/easily.

Extrapolating the Bernstein Polynomial is trivial, when exploiting its properties.
I did it here before for the tf Implementation and hopefully find some time next week to look into this and create a new PR.

@oduerr: Maybe we can have a call by the beginning of next week to discuss the matter?

I propose to add the option to switch to a linear map instead of a sigmoid (+ assert that inputs are in bounds) in #33 and merge the PR. Then you can submit a new PR (or an issue if you don't know how to tackle this either)
that adds the linear extrapolation feature later on. WDYT?

Sounds good to me. Thanks :)

One more general question: Is there code available where you evaluate typical flows on established benchmark data like the one provided from @gpapamak here?

If not, here is a self contained script to download, verify and prepare the data provided above. I will use this as a starting point to do some comparisons.

# removed for readability

@francois-rozet
Copy link
Member

francois-rozet commented Jan 16, 2024

Sounds good to me. Thanks :)

Can I merge the PR #33 then?

Is there code available where you evaluate typical flows on established benchmark data like the one provided from @gpapamak here?

You can check https://github.com/francois-rozet/uci-datasets to download the data and the benchmark branch for benchmarks on toy problems. Its a work in progress though.

P.S. Could you host your script somewhere else (e.g. a GitHub gist)? It makes it hard to navigate the discussion 😅

@oduerr
Copy link
Contributor Author

oduerr commented Jan 18, 2024

Sounds good to me. Thanks :)

Can I merge the PR #33 then?

Is there code available where you evaluate typical flows on established benchmark data like the one provided from @gpapamak here?

You can check https://github.com/francois-rozet/uci-datasets to download the data and the benchmark branch for benchmarks on toy problems. Its a work in progress though.

P.S. Could you host your script somewhere else (e.g. a GitHub gist)? It makes it hard to navigate the discussion 😅

Sorry for the late reply. Yes, you can merge PR #33. We will work on the extrapolation on Monday, but this could become another request. Thanks!

@francois-rozet
Copy link
Member

francois-rozet commented Jan 25, 2024

Hello @oduerr and @MArpogaus, I have merged #33 🥳 and added you as authors. I will soon release a new version (1.1.0) of Zuko including BPF and other new features. Notably, it is now possible to instantiate unconditional univariate flows (features=1, context=0), which you noticed was previously impossible in your notebook.

I think we can now close the present PR and you are welcome to open a new issue/PR for the additional features we discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants