Timeseries models derived from generative graph #642

juanitorduz · 2024-02-28T09:27:49Z

As discussed with @ricardoV94 I will port the gist https://gist.github.com/ricardoV94/a49b2cc1cf0f32a5f6dc31d6856ccb63#file-pymc_timeseries_ma-ipynb into the PyMC Example Gallery. I will add text and explanation to the existing working code :)

📚 Documentation preview 📚: https://pymc-examples--642.org.readthedocs.build/en/642/

review-notebook-app · 2024-02-28T09:27:55Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

juanitorduz · 2024-02-28T13:54:55Z

I think this one is ready for a first review round.

I have not been able to make the scan reference work with {func}~pytensor.scan.basic.scan nor {func}~pytensor.scan ... any tips? Thanks

review-notebook-app · 2024-03-11T18:53:03Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-03-11T18:53:02Z
----------------------------------------------------------------

In ~~This~~ this notebook, we show how to model
I would motivate more why you would do that, instead of just using pm.AR -- what are the benefits? In which cases is it worth the trouble dealing with pytensor and scan directly?

ricardoV94 commented on 2024-03-13T09:58:34Z
----------------------------------------------------------------

Agree with Alex. The main motivation is this framework allows you to define many arbitrary timeseries, not just things that are pre-packaged in PyMC. For the AR example, one could for instance add different Noise (StudentT) or covariates that change over time...

ricardoV94 commented on 2024-03-13T10:07:00Z
----------------------------------------------------------------

Maybe add a second more complex example, either MA2 https://gist.github.com/ricardoV94/a49b2cc1cf0f32a5f6dc31d6856ccb63#file-pymc_timeseries_ma-ipynb or one of those Jesse wrote here https://gist.github.com/jessegrabowski/ccda08b8a758f882f5794b8b89ace07a ?

jessegrabowski commented on 2024-03-13T10:28:06Z
----------------------------------------------------------------

I actually disagree, I think an AR(2) is a fine choice. I was going to put suggestions for other models here (ARIMA-GARCH or ETS), but I actually think it's better to keep this notebook really simple and focus on the machinery, which is quite complex.

ricardoV94 commented on 2024-03-13T10:51:14Z
----------------------------------------------------------------

Showing a non-recursive time varying parameter could be useful though? Can split into two separate notebooks?

jessegrabowski commented on 2024-03-13T10:54:36Z
----------------------------------------------------------------

I think that's a good 2nd example, because it also serves as a tutorial on the difference between outputs_info,sequences,and non_sequences

Even if it's not a time-varying parameter, maybe an example that shows how to combine an exogenous regression with an AR model, so you're just scanning in some covariate data and doing a linear model with AR distributed errors.

juanitorduz commented on 2024-05-06T12:17:35Z
----------------------------------------------------------------

Maybe add a second more complex example, either MA2?

I suggest we keep this notebook simple and work out other more complex examples in a different notebook (I can also work on it). In my experience, the first time an user sees these models can be overwhelming, so let's keep it simple for this one :D

review-notebook-app · 2024-03-11T18:53:04Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-03-11T18:53:03Z
----------------------------------------------------------------

I think we need to explicit what collect_default_updates does, as well as the scan API, which is always tricky.
Also explain what ar_init's type should be.
taps is like a countdown?

ricardoV94 commented on 2024-03-13T10:01:31Z
----------------------------------------------------------------

Re: collect_default_updates, it tells PyMC that the RV in the generative graph should be updated in every iteration of the loop. Agree with adding more context on how Scan is defined and linking to the PyTensor docs for a deeper dive: https://pytensor.readthedocs.io/en/latest/library/scan.html

juanitorduz commented on 2024-05-06T13:34:35Z
----------------------------------------------------------------

Addd more info!

review-notebook-app · 2024-03-11T18:53:05Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-03-11T18:53:04Z
----------------------------------------------------------------

Explain why you set all the observed data nodes to 0

jessegrabowski commented on 2024-03-13T10:51:13Z
----------------------------------------------------------------

Why is there an observed value for the initial condition? We never observe this by definition.

ricardoV94 commented on 2024-03-13T12:50:51Z
----------------------------------------------------------------

I don't see why you can't observe it?

jessegrabowski commented on 2024-03-13T13:41:37Z
----------------------------------------------------------------

Because the first observation in the data is $x_0$, so the initial conditions $x_{-1}$ and $x_{-2}$ are by definition unobserved

The way this model is written, it assumes that the first observations of the data are generated by some arbitrary normal distribution, which then go on to spontaneously kick-off an autoregressive process that describes the rest of the data. This isn't logical. The correct definition of the model should consider all observed data as part of the autoregressive process

juanitorduz commented on 2024-05-06T13:35:35Z
----------------------------------------------------------------

@jeseegrabowski I used the code you shareed on discourse and that is why I added you as a co-author :)

review-notebook-app · 2024-03-11T18:53:06Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-03-11T18:53:05Z
----------------------------------------------------------------

Can't we get rid of the observed in the first model, then use pm.observe here? That makes for a clearer API and experience for the readers

ricardoV94 commented on 2024-03-13T10:03:25Z
----------------------------------------------------------------

Agree. Let me know if something breaks

jessegrabowski commented on 2024-03-13T13:46:49Z
----------------------------------------------------------------

I blatantly plagiarized this notebook and used pm.observe in a discourse thread here, maybe it could be helpful.

juanitorduz commented on 2024-05-06T13:35:51Z
----------------------------------------------------------------

Fixed in 67ec83d

review-notebook-app · 2024-03-11T18:53:07Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-03-11T18:53:06Z
----------------------------------------------------------------

"we see the model is capturing the global dynamics of the time series. In order to have a ~~abetter~~ a better insight of the model"

jessegrabowski commented on 2024-03-13T10:24:47Z
----------------------------------------------------------------

I think a discussion of conditional and unconditional posteriors is needed here. Many users will be surprised by this posterior because they are used to seeing conditional one-step forecasts, $p(x_t | \{x_\tau\}_{\tau=0}^{t-1})$ (what you get in statsmodels/stata/everything), which are much tighter and follow the data more closely.

At the risk of scope-creep, I think it's also important to show users how to use a predictive model to get the conditional posterior. It would also be the first place in pymc-examples that shows how to use a predictive model -- up until now we only have the labs blog.

AlexAndorra

Love it @juanitorduz ! Added some comments in ReviewNB for the next round

ricardoV94 · 2024-03-13T09:58:36Z

Agree with Alex. The main motivation is you can this framework allows you to define many arbitrary time-series, not just things that are pre-packagend in PyMC. For the AR example, one could for instance add different Noise (StudentT) or covariates that change over time...

View entire conversation on ReviewNB

ricardoV94 · 2024-03-13T10:01:33Z

Re: collect_default_updates, it tells PyMC that the RV in the generative graph should be updated in every iteration of the loop. Agree with adding more context on how Scan is defined and linking to the PyTensor docs for a deeper dive: https://pytensor.readthedocs.io/en/latest/library/scan.html

View entire conversation on ReviewNB

ricardoV94 · 2024-03-13T10:02:33Z

May be a good excuse to use pm.observe instead of the dummy MutableData

View entire conversation on ReviewNB

ricardoV94 · 2024-03-13T10:03:26Z

Agree. Let me know if something breaks

View entire conversation on ReviewNB

ricardoV94 · 2024-03-13T10:07:01Z

Maybe add a second more complex example, either MA2 https://gist.github.com/ricardoV94/a49b2cc1cf0f32a5f6dc31d6856ccb63#file-pymc_timeseries_ma-ipynb or one of those Jesse wrote here https://gist.github.com/jessegrabowski/ccda08b8a758f882f5794b8b89ace07a ?

View entire conversation on ReviewNB

jessegrabowski · 2024-03-13T10:24:48Z

I think a discussion of conditional and unconditional posteriors is needed here. Many users will be surprised by this posterior because they are used to seeing conditional one-step forecasts, $p(x_t | \{x_\tau\}_{\tau=0}^{t-1})$ (what you get in statsmodels/stata/everything), which are much tighter and follow the data more closely.

At the risk of scope-creep, I think it's also important to show users how to use a predictive model to get the conditional posterior. It would also be the first place in pymc-examples that shows how to use a predictive model -- up until now we only have the labs blog.

View entire conversation on ReviewNB

jessegrabowski · 2024-03-13T10:28:08Z

I actually disagree, I think an AR(2) is a fine choice. I was going to put suggestions for other models here (ARIMA-GARCH or ETS), but I actually think it's better to keep this notebook really simple and focus on the machinery, which is quite complex.

View entire conversation on ReviewNB

jessegrabowski · 2024-03-13T10:51:14Z

Why is there an observed value for the initial condition? We never observe this by definition.

View entire conversation on ReviewNB

ricardoV94 · 2024-03-13T10:51:15Z

Showing a non-recursive time varying parameter could be useful though? Can split into two separate notebooks?

View entire conversation on ReviewNB

jessegrabowski · 2024-03-13T10:54:37Z

I think that's a good 2nd example, because it also serves as a tutorial on the difference between outputs_info,sequences,and non_sequences

View entire conversation on ReviewNB

ricardoV94 · 2024-03-13T12:50:52Z

I don't see why you can't observe it?

View entire conversation on ReviewNB

jessegrabowski · 2024-03-13T13:41:39Z

Because the first observation in the data is $x_0$, so the initial conditions $x_{-1}$ and $x_{-2}$ are by definition unobserved

View entire conversation on ReviewNB

jessegrabowski · 2024-03-13T13:46:50Z

I blatently plagerized this notebook and used pm.observe in a discourse thread here, maybe it could be helpful.

View entire conversation on ReviewNB

juanitorduz · 2024-05-05T18:49:19Z

Thank you all for the feedback! Now that the HSGP example was merged I can come back to this one (there are no dependencies but I was busy with other stuff 🫠). Apologies for the delay 🙈😄.

ricardoV94 · 2024-05-06T08:42:02Z

Thank you all for the feedback! Now that the HSGP example was merged I can come back to this one (there are no dependencies but I was busy with other stuff 🫠). Apologies for the delay 🙈😄.

Awesome!

juanitorduz · 2024-06-17T20:39:08Z

Thanks! Done

View entire conversation on ReviewNB

juanitorduz · 2024-06-17T20:39:23Z

I added some explanation.

View entire conversation on ReviewNB

juanitorduz · 2024-06-17T20:39:33Z

Done!

View entire conversation on ReviewNB

juanitorduz · 2024-06-17T20:39:52Z

Done! Thanbks for the reference!

View entire conversation on ReviewNB

juanitorduz · 2024-06-17T20:47:00Z

Added!

View entire conversation on ReviewNB

juanitorduz · 2024-06-17T20:57:57Z

yes!

View entire conversation on ReviewNB

juanitorduz · 2024-06-17T21:00:12Z

yes!

View entire conversation on ReviewNB

juanitorduz · 2024-06-17T21:12:00Z

I think as the conditional step has n_steps=trials - lags and the forecasting model has a generic n_steps=forecast_steps it is not that straightforward right?

View entire conversation on ReviewNB

juanitorduz · 2024-06-17T21:12:59Z

Any suggested values ? Changing the seed does not change that much and the rho values go very close to zero many times :D

View entire conversation on ReviewNB

juanitorduz · 2024-06-17T21:23:20Z

@OriolAbril, I am able to add links to the classes, but the functions and methods won't work for some reason ... do you have any tip :)

Concretely, {meth}`~pymc.model.transform.conditioning.observe` does not work (also tried with the scan function in PyTernsor)

ricardoV94 · 2024-06-18T12:07:45Z

I don't know if this can be recovered, but perhaps worth a shot? https://colab.research.google.com/drive/1yLrxTBRPa08B8EIEh6NGWG_aLFxIbanh?usp=sharing

View entire conversation on ReviewNB

ricardoV94 · 2024-06-18T12:08:30Z

Does that matter? n_steps can also be a Data variable that you change?

View entire conversation on ReviewNB

jessegrabowski · 2024-06-18T12:31:36Z

You could pick parameters that are 1) strongly persistent, and 2) give imaginary eigenvalues and generate oscillating trajectories. For example, rho_1 = 0.99, rho_2 = -0.99/4

View entire conversation on ReviewNB

OriolAbril · 2024-06-18T12:45:31Z

@juanitorduz These seem particular problems.

pytensor.scan is explicitly not indexed within the pytensor docs as a function: https://github.com/pymc-devs/pytensor/blob/main/doc/library/scan.rst?plain=1#L679. My guess from a quick look at the blame is theano probably had any type references which attempt to automatically resolve to any of the possible types, but pytensor.scan is both a module and a function in theano/aesara/pytensor and it seems in aesara times :noindex: was added in multiple places to get those to "work" again.

It looks like the main alternative right now is using {mod}`pytensor.scan` (which is the indexed entry for pytensor.scan). Another option could be keeping func and fixing it on pytensor end, or using the ref type cross-reference to the top of the page, that is `` {ref}scan <lib_scan>

For pymc.model.transform.conditioning.observe there doesn't seem to be anything remotely close to that in the pymc docs: https://www.pymc.io/projects/docs/en/latest/api/model.html. If I do look at the source though, I do see a model/conditioning entry which doesn't exist within that folder: https://github.com/pymc-devs/pymc/tree/main/docs/source/api/model, and the transform file is missing, so my guess is it was renamed but the toctree entry not updated, so it is not part of the website navigation tree anymore.

Here you should leave the reference correctly added and then it needs to be fixed on pymc end (once that is done, regenerating the examples will fix the issue without any extra work). Regarding fixing on the pymc docs, do we really expect users to use pymc.model.transform.conditioning.observe when they import observe?

If not, use the actual import path in the reference, and when fixing it document it where the users are expected to import it from (in this case pymc.observe as it is done in the notebook itself), the file structure of the library is completely irrelevant to users and should have no place in the public API docs.

If yes, then the notebook would need to be updated to use that.

juanitorduz · 2024-06-28T16:38:40Z

Thank you very much @OriolAbril ! (and apologies for the late reply, these last two weeks have been hectic 🫠)

ricardoV94 · 2024-07-15T11:25:20Z

@juanitorduz let us know if you need anything else from us at the moment

juanitorduz · 2024-07-15T11:27:55Z

Apologies @ricardoV94 🙈. I just need some time... Thinks are busy at this stage 🫠. Thanks for checking in! This PR is still in my todo list.

I think the main content is there: I just need to address the two open points:

write a function to reuse code
change the input series to make it less volatile

review-notebook-app · 2024-07-28T05:49:26Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-07-28T05:49:25Z
----------------------------------------------------------------

Line #2.    trials = 100  # Time series length

Poke to push this over the finish line, but also a real point:

I really dislike that the time series length is a global variable hidden away here. It seems to me like it should be inferred by the either the shape of the observed data, or the requested size/shape of the CustomDist.In the example gist I linked in another comment, I did something like:

def ar_dist(ar_init, rho, sigma, size):
ar_innov, _ = pytensor.scan(
       fn=ar_step,
       outputs_info=[{"initial": ar_init, "taps": range(-lags, 0)}],
       non_sequences=[rho, sigma],
       n_steps=size[0],
       strict=True,

   )
   return ar_innov

But this no longer works. I made this comment because I came back to this code for a project, and I was getting an error about NoneTensor types. Not sure if I did something wrong, or if it can be easily fixed.

If you insist on using a global for n_steps,I would argue it should have a different name, like n_timesteps, T, or timeseries_length."trials" is a name very disconnected from what it represents.

juanitorduz · 2024-07-28T05:54:51Z

Next week I'll have the time to work on this one 🙏

juanitorduz · 2024-07-28T20:44:57Z

@ricardoV94 @jessegrabowski Sorry for the late reply, but I am particularly busy (not just work, but kids and moving and...). Anyway, I cleaned up and updated the PyMC version (5.16). Regarding the open points:

@jessegrabowski trials - > timeseries_length I this the change ✅ . I agree that having it as a global variable is not ideal, but it serves the example's illustrative purposes. This was a gist I received from @ricardoV94, and changing it to have it as a function will take more time, and I think we should work on iterations.
@ricardoV94 Combining the conditional and forecasting distributions via the do operator is not as straightforward because the scan outputs are different (one uses sequences and the other utputs_info). Even though I am sure one can combine them, for me, learning this material, having them separate and more "verbose" makes it easier to follow (still, this is a personal opinion)

@ricardoV94 Making the series smoother: I could use @jessegrabowski 's suggestion of setting the values for rho directly. But then I would need to generate the series again instead of using the prior samples, which I believe was the idea of the initial gist. This, again, would require more work for the first iteration.

All in all, I think this notebook is in a very good state for the first iteration. I suggest we merge this one, and then we can take over additional enhancements in future PRs. This is very cool material and needs to be exposed to the users ;)

WDYT?

(Again, apologies for the time constraints to address all the comments in this iteration 🙏 )

ricardoV94 · 2024-08-05T18:13:10Z

@juanitorduz sounds good with me

twiecki · 2024-08-07T09:27:00Z

Let's make sure we advertise this one. Do you want to post it @juanitorduz?

juanitorduz · 2024-08-07T09:54:01Z

I will do so!

juanitorduz marked this pull request as draft February 28, 2024 09:27

juanitorduz self-assigned this Feb 28, 2024

ricardoV94 requested a review from jessegrabowski February 28, 2024 13:29

juanitorduz marked this pull request as ready for review February 28, 2024 13:54

juanitorduz requested a review from ricardoV94 February 28, 2024 13:55

AlexAndorra requested changes Mar 11, 2024

View reviewed changes

juanitorduz added 2 commits May 6, 2024 14:01

nb init

243f366

intro

78cd6b9

feedback part 2.1

28b917d

typos

01fb650

juanitorduz added 3 commits July 28, 2024 11:03

rename time series length

b2433dd

cleanup

2d9c6c1

update pymc

333698f

twiecki merged commit 5501dfd into pymc-devs:main Aug 7, 2024
2 checks passed

juanitorduz deleted the time_series_graph branch August 7, 2024 09:53

Timeseries models derived from generative graph #642

Timeseries models derived from generative graph #642

Conversation

juanitorduz commented Feb 28, 2024 • edited by github-actions bot Loading

review-notebook-app bot commented Feb 28, 2024

juanitorduz commented Feb 28, 2024 • edited Loading

review-notebook-app bot commented Mar 11, 2024 • edited Loading

review-notebook-app bot commented Mar 11, 2024 • edited Loading

review-notebook-app bot commented Mar 11, 2024 • edited Loading

review-notebook-app bot commented Mar 11, 2024 • edited Loading

review-notebook-app bot commented Mar 11, 2024 • edited Loading

AlexAndorra left a comment

Choose a reason for hiding this comment

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

jessegrabowski commented Mar 13, 2024

jessegrabowski commented Mar 13, 2024

jessegrabowski commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

jessegrabowski commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

jessegrabowski commented Mar 13, 2024

jessegrabowski commented Mar 13, 2024

juanitorduz commented May 5, 2024 • edited Loading

ricardoV94 commented May 6, 2024

juanitorduz commented Jun 17, 2024

juanitorduz commented Jun 17, 2024

juanitorduz commented Jun 17, 2024

juanitorduz commented Jun 17, 2024

juanitorduz commented Jun 17, 2024

juanitorduz commented Jun 17, 2024

juanitorduz commented Jun 17, 2024

juanitorduz commented Jun 17, 2024

juanitorduz commented Jun 17, 2024

juanitorduz commented Jun 17, 2024 • edited by OriolAbril Loading

ricardoV94 commented Jun 18, 2024

ricardoV94 commented Jun 18, 2024

jessegrabowski commented Jun 18, 2024

OriolAbril commented Jun 18, 2024

juanitorduz commented Jun 28, 2024

ricardoV94 commented Jul 15, 2024

juanitorduz commented Jul 15, 2024 • edited Loading

review-notebook-app bot commented Jul 28, 2024 • edited Loading

juanitorduz commented Jul 28, 2024

juanitorduz commented Jul 28, 2024

ricardoV94 commented Aug 5, 2024

twiecki commented Aug 7, 2024

juanitorduz commented Aug 7, 2024

juanitorduz commented Feb 28, 2024 •

edited by github-actions bot

Loading

juanitorduz commented Feb 28, 2024 •

edited

Loading

review-notebook-app bot commented Mar 11, 2024 •

edited

Loading

review-notebook-app bot commented Mar 11, 2024 •

edited

Loading

review-notebook-app bot commented Mar 11, 2024 •

edited

Loading

review-notebook-app bot commented Mar 11, 2024 •

edited

Loading

review-notebook-app bot commented Mar 11, 2024 •

edited

Loading

juanitorduz commented May 5, 2024 •

edited

Loading

juanitorduz commented Jun 17, 2024 •

edited by OriolAbril

Loading

juanitorduz commented Jul 15, 2024 •

edited

Loading

review-notebook-app bot commented Jul 28, 2024 •

edited

Loading