Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility of jupyter notebook example #44

Closed
dengzeyu opened this issue Nov 5, 2023 · 10 comments
Closed

Reproducibility of jupyter notebook example #44

dengzeyu opened this issue Nov 5, 2023 · 10 comments

Comments

@dengzeyu
Copy link

dengzeyu commented Nov 5, 2023

I just run the Notebook examples, and I found the number I got is slightly different compared to the number shown on documentation webpage. I assume this is normal right? If so, please indicate there will be some differences when rerun the notebook examples.

One of the example is on https://kinisi.readthedocs.io/en/latest/vasp_dj.html

diff.D_J.n, diff.D_J.con_int()

I got

(1.423210688465144e-05, array([1.60546798e-06, 2.96878163e-05]))

on the webpage the results are:

(1.4064382415055966e-05, array([1.54355335e-06, 2.96839782e-05]))
@bjmorgan
Copy link
Owner

bjmorgan commented Nov 5, 2023

@arm61 this page in the docs talks about bootstrapping (which needs updating). And possibly means these examples were generated with the old variance estimation scheme.

@zhubonan
Copy link
Contributor

zhubonan commented Nov 5, 2023

Just for tracking - this issue is related to the joss submission: openjournals/joss-reviews#5984 (comment)

@arm61
Copy link
Collaborator

arm61 commented Nov 5, 2023

This is a known issue cause we use a stochastic MCMC sampling. I have added the random state to the docs and removed the out of date mention of the bootstrapping in e2dcee6 and 54734c6

@dengzeyu
Copy link
Author

dengzeyu commented Nov 6, 2023

@arm61 I've rerun the notebook, but I still get different results compared to the one shown on documention website. For the same example: https://kinisi.readthedocs.io/en/latest/vasp_dj.html

diff.D_J.n, diff.D_J.con_int()

The webpage shows

(1.6995159773675975e-05, array([1.40932494e-06, 3.78731225e-05]))

However, I got

(1.6953192500355356e-05, array([1.98685233e-06, 3.81374320e-05]))

In addition, these results differ a lot compared to your previous version as shown above, did you change the input parameters?

@arm61
Copy link
Collaborator

arm61 commented Nov 6, 2023

I suspect that this is due to small variations in the exact environment that you have. Note that we do no aspire to complete reproducibility between your build of the docs and those online (however, if you rerun the notebook you should get the same numbers repeatably). I am happy to add a statement to this effect to the documentation, but cross-machine reproducibility is a very hard problem so it feels redundant.

@bjmorgan
Copy link
Owner

bjmorgan commented Nov 6, 2023

For reference, the difference in the "best-fit" estimate of D_J is ~0.1% the 95% compatibility interval.

@dengzeyu
Copy link
Author

dengzeyu commented Nov 6, 2023

I suspect that this is due to small variations in the exact environment that you have. Note that we do no aspire to complete reproducibility between your build of the docs and those online (however, if you rerun the notebook you should get the same numbers repeatably). I am happy to add a statement to this effect to the documentation, but cross-machine reproducibility is a very hard problem so it feels redundant.

Yeah please add a statement then I'll close this issue.

@bjmorgan
Copy link
Owner

bjmorgan commented Nov 6, 2023

Isn't this the case for any numerical (rather than analytical) method? @arm61 maybe a generic statement to cover the entire docs rather than just on this page (and then repeating on other pages?)

@arm61
Copy link
Collaborator

arm61 commented Nov 6, 2023

I have added something to the FAQ about this in 0d9a4ab

@bjmorgan
Copy link
Owner

bjmorgan commented Nov 6, 2023

This seems like a good place to me.

@dengzeyu dengzeyu closed this as completed Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants