-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewrite radon notebook using ArviZ and xarray #3963
Conversation
Check out this pull request on Review Jupyter notebook visual diffs & provide feedback on notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2020-06-14T03:11:15Z I am not sure if I am using xarray explicitly or not, I think not, but I am relying heavily on it so even if there are no |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2020-06-14T03:11:16Z I just want to highlight this couple of lines that apply hdi to xarray groupby objects |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2020-06-14T03:11:17Z Note to self: I forgot to sort here and half of the plot is inverted: basement to the right, floor to the left |
I think that the part that has already been updated is ready for review, I hope all xarray tricks are clear and do not seem too obfuscating. Please comment on explanations being too little (or too much) detailed, names used (variables, coordinates, dimensions...), all comments will be welcome |
Just did a quick skim and this is looking really nice @OriolAbril 👌 A first quick comment: you can use |
View / edit / reply to this conversation on ReviewNB ColCarroll commented on 2020-06-17T02:20:20Z can get rid of some white space here OriolAbril commented on 2020-06-17T17:13:10Z There is only one line between model context and plotting, the rest are reviewNB artifacts |
View / edit / reply to this conversation on ReviewNB ColCarroll commented on 2020-06-17T02:20:21Z "Radon" is lower case twice, plus missing a period at the end of the paragraph) |
View / edit / reply to this conversation on ReviewNB ColCarroll commented on 2020-06-17T02:20:22Z this is funny that the previous run had only 3 cores! also, in the paragraph below here, would you mind lowercasing "Flash" to "flash"? AlexAndorra commented on 2020-06-17T09:51:01Z This is a nerdy reference to "The Flash", so no lowercase required ;) OriolAbril commented on 2020-06-17T17:14:42Z Maybe I should upercase "The" then? |
View / edit / reply to this conversation on ReviewNB ColCarroll commented on 2020-06-17T02:20:23Z I can't tell if this white space in the cell is just from reviewnb or not... also in the cell below AlexAndorra commented on 2020-06-17T09:51:42Z Yeah, the white space is from ReviewNB. I think they do that do that to ease comparison within cells, but I find it's not yet optimal |
View / edit / reply to this conversation on ReviewNB ColCarroll commented on 2020-06-17T02:20:24Z lowercase Radon -> radon again |
View / edit / reply to this conversation on ReviewNB ColCarroll commented on 2020-06-17T02:20:25Z I wonder what happened to make this take 5 seconds instead of 93 seconds! |
this looks awesome oriol! i'm also excited about alex's change, with the A note I don't feel strongly about, but can make the cells a little cleaner: if you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, did a complete review! Lots of good stuff in there. I had some comments and questions that you'll find below but that shouldn't be too much work 😉 Tell me if anything is unclear!
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:07Z
You may be wondering why we are using the |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:08Z I don't think you need to pass the model kwarg to from_pymc3 since we're inside the model context OriolAbril commented on 2020-06-17T21:28:03Z used to (due to bug in ArviZ), I am updating this so I'll mark as resolved :) |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:09Z
ArviZ
This is nice ;) |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:10Z Let's use
OriolAbril commented on 2020-06-17T21:12:46Z yep, I was waiting until merging the fix for automagical coords (otherwise I would have had to update the code to use return inferencedata and inferencedata kwargs and then again to remove the kwargs) |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:11Z No need for |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:12Z "... workflow into a single object. This will also make the rendering of plots and diagnostics faster -- otherwise ArviZ will need to convert your trace to |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:13Z Not sure the last paragraph needs all the details of the difference between xarray and ArviZ. I would just explain why you need OriolAbril commented on 2020-06-17T21:36:22Z I think this will give more context into xarray which will ease the application of these techniques to different (but still somewhat similar) analylsis. I would not oppose to removing it though. AlexAndorra commented on 2020-06-18T07:59:44Z It's your call honestly. If you keep everything though, maybe emphasize the |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:14Z Very cool stuff ;) |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:15Z pooled_idata.observed_data = pooled_idata.observed_data.assign_coords( {"Level": pooled_idata.posterior.Level[pooled_idata.constant_data.floor_idx]} ).sortby("Level") pooled_idata.observed_data.plot.scatter(x="Level", y="y", label="Observations", alpha=0.4) This is quite complicated compared to the previous implementation: plt.scatter(floor, log_radon, label="Observations", alpha=0.4) If you want to keep for educational purposes of xarray usage, I would briefly explain what this is doing. OriolAbril commented on 2020-06-17T22:02:45Z I want to leave this for 2 motives:
I am moving it to its own cell above to explain what is happening and why. AlexAndorra commented on 2020-06-18T08:01:40Z I agree it's a good educational opportunity. Moving it to its own cell above to explain what is happening and why is the right thing to do IMO |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:16Z This is nice! I would change the xticklabels as "titled" and not as capitalized. That way they'll take less place and maybe you'll be able to put more of them. Also, making the figure bigger vertically would give more space to the plots. |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:17Z Return Inf Data directly and no need for model kwarg |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:18Z Return Inf Data directly and no need for model kwarg |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-17T09:48:19Z use |
This is a nerdy reference to "The Flash", so no lowercase required ;) View entire conversation on ReviewNB |
Yeah, the white space is from ReviewNB. I think they do that do that to ease comparison within cells, but I find it's not yet optimal View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-18T08:36:48Z "We will multiply |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-18T08:36:49Z "However, we don't have that much of a problem to sample..." |
View / edit / reply to this conversation on ReviewNB AlexAndorra commented on 2020-06-18T08:36:50Z What is OriolAbril commented on 2020-06-18T10:34:03Z It governs how the chain number affects the plotted lines. The default for AlexAndorra commented on 2020-06-18T10:41:22Z Ah nice, didn't know about that! |
It governs how the chain number affects the plotted lines. The default for View entire conversation on ReviewNB |
Ah nice, didn't know about that! View entire conversation on ReviewNB |
I am having some issues with reviewNB so I'll comment here: This plot is the one with most differences between previous version and current implementation -- I strived to make all plots look exactly the same as the original. That is because the implementation is actually different:
Not sure if this is worth mentioning in the notebook, i.e. something along the lines of how seamless it is to average the desired dimensions after all calculations have been done now that we don't have to care about broadcasting rules. |
Do you mean that the results are different, or the code to get them is? In the first case that'd be a problem because it would mean weren't computing the same quantity. In the case, it's fine and well worth highlighting that we don't have to care about broadcasting and shape handling with xarrays 👌 |
results are different! Sorry I was not clear enough and I can't get reviewNB to show the same cell side by side so it is difficult to see the differences. As I was saying above, the original approach was to average first, operate afterwards. This therefore assumes that the expected value of an operation is the operation applied to the expected values of the operands. This is true for sums (and therefore correct for TL;DR: yes, results are different because they should be due to an averaging mistake in the previous version. |
Very clear, thanks 👌 And I realize I wasn't very clear either 😅 |
Looks like there's a lot of verbosity in the cells displaying the xarray objects, might want to suppress those outputs for brevity. |
@kyleabeauchamp I guess you are talking about the html repr which is not rendered by github, the raw text is really long, but it is hidden and can be shown interactively, here is a proper render: https://nbviewer.jupyter.org/github/OriolAbril/pymc3/blob/radon_nb/docs/source/notebooks/multilevel_modeling.ipynb |
I think I have addressed all comments so far, do not hesitate to complain if I missed some comment. I have reexecuted the nb on pymc 3.9.2 and ArviZ 0.9.0 so I think everything is ready to merge now unless there are still more reviews coming which will always be very welcome |
Ha ha I have one last nitpick comment: I think |
Should this be added to release notes? |
I think so, it's a pretty big revamp, so it's good to publicize it 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @OriolAbril, all looks awesome now 🎉 I'm in love with this notebook 💘
* Update GP NBs to use standard notebook style (pymc-devs#3978) * update gp-latent nb to use arviz * rerun, run black * rerun after fixes from comments * rerun black * rewrite radon notebook using ArviZ and xarray (pymc-devs#3963) * rewrite radon notebook using ArviZ and xarray Roughly half notebook has been updated * add comments on xarray usage * rewrite 2n half of notebook * minor fix * rerun notebook and minor changes * rerun notebook on pymc3.9.2 and ArviZ 0.9.0 * remove unused import * add change to release notes * SMC: refactor, speed-up and run multiple chains in parallel for diagnostics (pymc-devs#3981) * first attempt to vectorize smc kernel * add ess, remove multiprocessing * run multiple chains * remove unused imports * add more info to report * minor fix * test log * fix type_num error * remove unused imports update BF notebook * update notebook with diagnostics * update notebooks * update notebook * update notebook * Honor discard_tuned_samples during KeyboardInterrupt (pymc-devs#3785) * Honor discard_tuned_samples during KeyboardInterrupt * Do not compute convergence checks without samples * Add time values as sampler stats for NUTS (pymc-devs#3986) * Add time values as sampler stats for NUTS * Use float time counters for nuts stats * Add timing sampler stats to release notes * Improve doc of time related sampler stats Co-authored-by: Alexandre ANDORRA <[email protected]> Co-authored-by: Alexandre ANDORRA <[email protected]> * Drop support for py3.6 (pymc-devs#3992) * Drop support for py3.6 * Update RELEASE-NOTES.md Co-authored-by: Colin <[email protected]> Co-authored-by: Colin <[email protected]> * Fix Mixture distribution mode computation and logp dimensions Closes pymc-devs#3994. * Add more info to divergence warnings (pymc-devs#3990) * Add more info to divergence warnings * Add dataclasses as requirement for py3.6 * Fix tests for extra divergence info * Remove py3.6 requirements * follow-up of py36 drop (pymc-devs#3998) * Revert "Drop support for py3.6 (pymc-devs#3992)" This reverts commit 1bf867e. * Update README.rst * Update setup.py * Update requirements.txt * Update requirements.txt Co-authored-by: Adrian Seyboldt <[email protected]> * Show pickling issues in notebook on windows (pymc-devs#3991) * Merge close remote connection * Manually pickle step method in multiprocess sampling * Fix tests for extra divergence info * Add test for remote process crash * Better formatting in test_parallel_sampling Co-authored-by: Junpeng Lao <[email protected]> * Use mp_ctx forkserver on MacOS * Add test for pickle with dill Co-authored-by: Junpeng Lao <[email protected]> * Fix keep_size for arviz structures. (pymc-devs#4006) * Fix posterior pred. sampling keep_size w/ arviz input. Previously posterior predictive sampling functions did not properly handle the `keep_size` keyword argument when getting an xarray Dataset as parameter. Also extended these functions to accept InferenceData object as input. * Reformatting. * Check type errors. Make errors consistent across sample_posterior_predictive and fast_sample_posterior_predictive, and add 2 tests. * Add changelog entry. Co-authored-by: Robert P. Goldman <[email protected]> * SMC-ABC add distance, refactor and update notebook (pymc-devs#3996) * update notebook * move dist functions out of simulator class * fix docstring * add warning and test for automatic selection of sort sum_stat when using wassertein and energy distances * update release notes * fix typo * add sim_data test * update and add tests * update and add tests * add docs for interpretation of length scales in periodic kernel (pymc-devs#3989) * fix the expression of periodic kernel * revert change and add doc * FIXUP: add suggested doc string * FIXUP: revertchanges in .gitignore * Fix Matplotlib type error for tests (pymc-devs#4023) * Fix for issue 4022. Check for support for `warn` argument in `matplotlib.use()` call. Drop it if it causes an error. * Alternative fix. * Switch from pm.DensityDist to pm.Potential to describe the likelihood in MLDA notebooks and script examples. This is done because of the bug described in arviz-devs/arviz#1279. The commit also changes a few parameters in the MLDA .py example to match the ones in the equivalent notebook. * Remove Dirichlet distribution type restrictions (pymc-devs#4000) * Remove Dirichlet distribution type restrictions Closes pymc-devs#3999. * Add missing Dirichlet shape parameters to tests * Remove Dirichlet positive concentration parameter constructor tests This test can't be performed in the constructor if we're allowing Theano-type distribution parameters. * Add a hack to statically infer Dirichlet argument shapes Co-authored-by: Brandon T. Willard <[email protected]> Co-authored-by: Bill Engels <[email protected]> Co-authored-by: Oriol Abril-Pla <[email protected]> Co-authored-by: Osvaldo Martin <[email protected]> Co-authored-by: Adrian Seyboldt <[email protected]> Co-authored-by: Alexandre ANDORRA <[email protected]> Co-authored-by: Colin <[email protected]> Co-authored-by: Brandon T. Willard <[email protected]> Co-authored-by: Junpeng Lao <[email protected]> Co-authored-by: rpgoldman <[email protected]> Co-authored-by: Robert P. Goldman <[email protected]> Co-authored-by: Tirth Patel <[email protected]> Co-authored-by: Brandon T. Willard <[email protected]>
I am in the process of rewriting the notebook to exploit all ArviZ and xarray goodness. Comments are very welcome and I hope it is useful as a way to showcase xarray mostly and ArviZ capabilities inherited from xarray. Part of #3959
Depending on what your PR does, here are a few things you might want to address in the description:
Doing this I have encountered a couple bugs in ArviZ so this notebook won't even run on ArviZ master, only on PR branch. I am not sure about the timeline of the doc sprint but it will end up either running on ArviZ master or on next release (there may be a release coming in the coming weeks 🤔). Depends on fix io_pymc3 bug when model not passed explicitly arviz-devs/arviz#1240 and [WIP] fix hdi false positive warning arviz-devs/arviz#1241All PRs have been merged and ArviZ 0.9.0 has been released!