-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test second shell script MCMC_800_1s-1.sh
#5
Comments
MCMC_800_1s-1.txt
MCMC_800_1s-1.sh
I get this error message from the imports section of the python script:
@RoryAtBar Have you encountered this previously? I've made sure to install the correct versions of tensorflow and tensorflow-probability, which I've now added to a requirements file and documented in the csf setup file. The versions of packages I have from
|
Rory suggested trying gpflow <= 2.5.2 |
Have resubmitted with gpflow=2.5.2 and it looks to be running so far... |
Ok so I get what looks to be sensible output, but also this error. Should I be concerned/do you know how I can fix this? @RoryAtBar
|
The issue with running the script on GPUs I'm not sure about, but it doesn't sound like a major problem. This issue with initial evaluation results, yes I have encountered it before. The problem is essentially that the likelihood function is somehow mis-specified, and is giving spurious results, so the chains are being initialised outside of what should be allowed by the prior probability distribution (which is specified in the pm.Model() context manager). The likelihood function uses the Gaussian process model. There could be something wrong with the GP, does the script plot the fit of the GP? If the GP looks ok, then I'll need to plot out some of the values of the likelihood function. Might be worth me having a play with the script, I can have a look early next week |
Hi Rory, you asked on slack
Not that I can see - I guess this means it's an important error :) |
Hi @RoryAtBar |
Hi Gerard,
I have been very ill this week so haven't. I will look at it asap.
Rory
Sent from Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Gerard Capes ***@***.***>
Sent: Friday, May 17, 2024 3:59:49 PM
To: RoryAtBar/Abaqus_bayesian_matflow ***@***.***>
Cc: RoryAtBar ***@***.***>; Mention ***@***.***>
Subject: Re: [RoryAtBar/Abaqus_bayesian_matflow] Test second shell script `MCMC_800_1s-1.sh` (Issue #5)
The issue with running the script on GPUs I'm not sure about, but it doesn't sound like a major problem.
This issue with initial evaluation results, yes I have encountered it before. The problem is essentially that the likelihood function is somehow mis-specified, and is giving spurious results, so the chains are being initialised outside of what should be allowed by the prior probability distribution (which is specified in the pm.Model() context manager).
The likelihood function uses the Gaussian process model. There could be something wrong with the GP, does the script plot the fit of the GP? If the GP looks ok, then I'll need to plot out some of the values of the likelihood function.
Might be worth me having a play with the script, I can have a look early next week
Hi @RoryAtBar<https://github.com/RoryAtBar>
Did you manage to have a look at this?
—
Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A3IND576PQUVGSOBT3CLUK3ZCYLOLAVCNFSM6AAAAABFVR4PVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXG44TIMJQGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi Gerard, I have added a solution to an extra branch (gp_kernel_tester) which trains GP models of increasing flexibility until one works. It's crude and not scientifically rigorous but it is adequate for this specific problem, though might need to be changed at a later date if a more general solution is needed. Seems to be working for now. |
Just submitted a job using this new script. |
|
Hang on, I think I know what this one is. I will sort it out
Sent from Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Gerard Capes ***@***.***>
Sent: Friday, May 31, 2024 1:31:08 PM
To: RoryAtBar/Abaqus_bayesian_matflow ***@***.***>
Cc: RoryAtBar ***@***.***>; Mention ***@***.***>
Subject: Re: [RoryAtBar/Abaqus_bayesian_matflow] Test second shell script `MCMC_800_1s-1.sh` (Issue #5)
AttributeError: module 'gpflow.models' has no attribute 'Matern52'
@RoryAtBar<https://github.com/RoryAtBar> any ideas on this one?
—
Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A3IND5Y3GJMKCPLV6S3WHEDZFBUQZAVCNFSM6AAAAABFVR4PVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBRHE4DSOBQHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
This is from MCMC_800C_1s-1.py right? The error makes it sound like somewhere in the code there is a line that says: If that was the case, then the fix is to change this line to
where
I had this previously because when creating the branch gp_kernel_tester, I had put this in by mistake and fixed it. When you sent this error, I presumed I had simply forgotten to push it to github. I can't however find this error in the code, would you be able to direct me to it? |
@RoryAtBar
The output log file looks ok, except that the progress bar lines show mac line endings, which doesn't match the rest of the file. Do you know which part of the code generates these?
|
1) I don't know much about pyarrow
2) the failed cholesky decomposition is hopefully dealt with using the additions in the gp_kernel_check branch, but I will check. I created an additional output.txt file since printing in the standard output file doesn't always work
3) the progress bar is generated to track the progress of the sampling when called using:
idata = pm.sample()
This is called within the pymc context manager which in the code is created with this line
with pm.Model() as model:
The low effective sample size may need addressing. I will look when I get a chance later today.
Sent from Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Gerard Capes ***@***.***>
Sent: Wednesday, June 5, 2024 11:13:36 AM
To: RoryAtBar/Abaqus_bayesian_matflow ***@***.***>
Cc: RoryAtBar ***@***.***>; Mention ***@***.***>
Subject: Re: [RoryAtBar/Abaqus_bayesian_matflow] Test second shell script `MCMC_800_1s-1.sh` (Issue #5)
@RoryAtBar<https://github.com/RoryAtBar>
Could you take a quick look at this and confirm whether they're as expected?
$ cat MCMC_800C_1s-1.sh.e4990184
mkdir: cannot create directory ‘/mnt/iusers01/support/mbexegc2/scratch/MCMC_GPsurrgt_800C_1s-1_cond0-1500_20000_chain’: File exists
/net/scratch2/mbexegc2/Abaqus_bayesian_matflow/MCMC_800C_1s-1.py:10: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at pandas-dev/pandas#54466
import pandas as pd
2024-06-04 09:15:26.741914: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-04 09:15:29.923159: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-04 09:15:29.924662: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-04 09:16:18.024203: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-06-04 09:44:59.306185: W tensorflow/core/kernels/linalg/cholesky_op.cc:56] Cholesky decomposition was not successful. Eigen::LLT failed with error code 1. Filling lower-triangular output with NaNs.
2024-06-04 09:45:06.553742: W tensorflow/core/kernels/linalg/cholesky_op.cc:56] Cholesky decomposition was not successful. Eigen::LLT failed with error code 1. Filling lower-triangular output with NaNs.
/net/scratch2/mbexegc2/Abaqus_bayesian_matflow/MCMC_800C_1s-1.py:465: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
Force_at_800C_1s[n] = dat[:,1][abs((dat[:,0]+x_correction)-step)==min(abs((dat[:,0]+x_correction)-step))]
Sequential sampling (5 chains in 1 job)
CompoundStep
Metropolis: [Friction]
Metropolis: [Conductance]
Sampling 5 chains for 10_000 tune and 20_000 draw iterations (50_000 + 100_000 draws total) took 23522 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
The output log file looks ok, except that the progress bar lines show max line endings, which doesn't match the rest of the file. Do you know which part of the code generates these?
MCMC_800C_1s-1.sh.o4990184: |█████████████████████████████| 100.00% [30000/30000 1:16:54<00:00 Sampling chain 4, 0 divergences]
—
Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A3IND52LPBWC2YDGRGTIR2DZF3QFBAVCNFSM6AAAAABFVR4PVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBZGQYTSMRWGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Very sorry for the slow response, The results look ok, the actual values are a bit odd, possibly because of testing this with a limited set of data (conductance limited to 1500). I'm not getting the issue with the limited effective sample size, maybe I'm using a different set of input data to you? All I have done is used the scripts currently in the main branch |
Hi Rory, That's encouraging. It's been a while since I last looked at this but I think I was using the |
Sure, but I don't get it on that branch either. There is a limited amount of randomness in the starting point where GPs are trained... unless you are using a different set of data I can't think what else it could be...
Sent from Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Gerard Capes ***@***.***>
Sent: Friday, June 14, 2024 2:58:47 PM
To: RoryAtBar/Abaqus_bayesian_matflow ***@***.***>
Cc: RoryAtBar ***@***.***>; Mention ***@***.***>
Subject: Re: [RoryAtBar/Abaqus_bayesian_matflow] Test second shell script `MCMC_800_1s-1.sh` (Issue #5)
Hi Rory,
That's encouraging. It's been a while since I last looked at this but I think I was using the gp_kernel_tester branch.
—
Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A3IND55JDZE3NC5EDATZNFTZHLZJPAVCNFSM6AAAAABFVR4PVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYGEYTEOBYGE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Might be different version of the libraries perhaps? |
I'll re-run this next week to see if I still get the error. Rory said there's a bit of randomness involved and I might have got a a bad seed. It can be set up to re-start if it fails, but currently isn't. |
I forgot that this script uses the output from the first one... I was tidying up and deleted the output so I'm running it again before I can run the second script. 🙄 |
Second script now running using the |
Same error - re-reading some detail, I see this was the wrong branch! Resubmitting on |
Thanks Gerard, Unfortunately, the image shown shows an extreme case of overfitting. I have re-jigged the way the Gaussian processes are trained for the part of the project I am currently working through. At the risk of you killing me, can we have a call where I show you how I want it to work?
Then there is the MCMC step in that script that needs a small modification to adapt to the above change |
Sure - I could do tomorrow or Friday? |
Check I can get this to run on CSF3
The text was updated successfully, but these errors were encountered: