Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thermalization of CPU LB broken #3804

Closed
pkreissl opened this issue Jul 17, 2020 · 7 comments · Fixed by #3847
Closed

Thermalization of CPU LB broken #3804

pkreissl opened this issue Jul 17, 2020 · 7 comments · Fixed by #3847

Comments

@pkreissl
Copy link
Contributor

IMHO broken for CPU. Minimal example:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--use_gpu", default=False, action="store_true")
parser.add_argument("--kt", type=float, default=0)
args = parser.parse_args()

import espressomd
from espressomd.observables import LBFluidPressureTensor

system = espressomd.System(box_l=[20, 20, 20])
system.time_step = 0.01
system.cell_system.skin = 0.4
if args.use_gpu:
    lbf = espressomd.lb.LBFluidGPU(agrid=1.0, dens=1.9, visc=1.0, tau=0.01, kT=args.kt, seed=7)
else:
    lbf = espressomd.lb.LBFluid(agrid=1.0, dens=1.9, visc=1.0, tau=0.01, kT=args.kt, seed=7)
system.actors.add(lbf)
lb_pressure = LBFluidPressureTensor()
for i in range(2):  # change as you like
    system.integrator.run(1000)  # same here, problem persists
    print(lb_pressure.calculate())

Testing via run_minimal.sh:

#!/bin/bash

echo "CPU, kt=0:"
./pypresso minimal.py
echo "GPU, kt=0:"
./pypresso minimal.py --use_gpu
echo "CPU, kt=1:"
./pypresso minimal.py --kt 1
echo "GPU, kt=1:"
./pypresso minimal.py --use_gpu --kt 1

yields:

$ ./run_minimal.sh 
CPU, kt=0:
[[6333.33325386    0.            0.        ]
 [   0.         6333.33325386    0.        ]
 [   0.            0.         6333.33325386]]
[[6333.33325386    0.            0.        ]
 [   0.         6333.33325386    0.        ]
 [   0.            0.         6333.33325386]]
GPU, kt=0:
WARNING: More than one GPU detected, please note ESPResSo uses device 0 by default regardless of usage or capability. The GPU to be used can be modified by setting System.cuda_init_handle.device.
[[6333.33353698    0.            0.        ]
 [   0.         6333.33353698    0.        ]
 [   0.            0.         6333.33353698]]
[[6333.33353698    0.            0.        ]
 [   0.         6333.33353698    0.        ]
 [   0.            0.         6333.33353698]]
CPU, kt=1:
[[6.39378045e+03 1.90899139e+00 1.95662446e+00]
 [1.90899139e+00 6.38988958e+03 1.94024253e+00]
 [1.95662446e+00 1.94024253e+00 6.38844944e+03]]
[[6.39384017e+03 1.91773972e+00 2.03000427e+00]
 [1.91773972e+00 6.39001967e+03 1.98502662e+00]
 [2.03000427e+00 1.98502662e+00 6.38863714e+03]]
GPU, kt=1:
WARNING: More than one GPU detected, please note ESPResSo uses device 0 by default regardless of usage or capability. The GPU to be used can be modified by setting System.cuda_init_handle.device.
[[ 6.33442971e+03 -2.38058610e-02 -6.62311537e-02]
 [-2.38058610e-02  6.33435239e+03  4.31334905e-02]
 [-6.62311537e-02  4.31334905e-02  6.33439990e+03]]
[[ 6.33472237e+03 -5.51996531e-02 -2.82031257e-02]
 [-5.51996531e-02  6.33468426e+03  1.38701318e-02]
 [-2.82031257e-02  1.38701318e-02  6.33475568e+03]]

kT=0 case is fine, but with thermalization, CPU produces off-dagonal pressure values of order $10^0$, instead of GPU, order $10^{-2}$ which seems more reasonable, I think.

@RudolfWeeber
Copy link
Contributor

RudolfWeeber commented Jul 17, 2020 via email

@pkreissl
Copy link
Contributor Author

pkreissl commented Jul 17, 2020

With gpu the acf integral produces ~ resonable values for the viscosity via Green-Kubo, for cpu, the acf does not even decay yet (~1e6 steps samples, runs A LOT slower than gpu...)

@pkreissl
Copy link
Contributor Author

Concerning Ulfs PhD thesis, I have not yet had to much to do with it, apart from browsing a bit and I am thus not immediately familiar with the different notations and definitions used there. Will give it a try, however, it will probably be a lot faster, if someone more involved with this, e.g. @mkuron or @KaiSzuttor could comment...

@pkreissl
Copy link
Contributor Author

pkreissl commented Jul 17, 2020

ACF of off-diagonal elements using 400000 samles (and a slightly differen lb parameter set, however nothing fancy), evaluated every second time step, for GPU:
Screenshot from 2020-07-17 13-49-50
and for CPU:
Screenshot from 2020-07-17 13-50-35

@pkreissl
Copy link
Contributor Author

In offline discussion, @RudolfWeeber suggested that this issue might be a regression. I checked for ESPResSo version 4.1.0, same problem, for 4.0.0 there doesn't seem to be a fluid stress observable (neither LBFluidPressureTensor nor LBFluidStress)?! So this problem does indeed seem to have been around for some time.

@KaiSzuttor
Copy link
Member

How do you know that the issue has been around for some time if you cannot check for versions older than 4.1.0 (which is less than a year old)?

@pkreissl
Copy link
Contributor Author

pkreissl commented Jul 22, 2020

How do you know that the issue has been around for some time if you cannot check for versions older than 4.1.0 (which is less than a year old)?

Let me rephrase that: I couldn't find the feature before 4.1.0 (if it had just a different name, please do tell me, I'll check for the issue) The LBFluidStress observable was introduced with 4.1.0 (PR #2054), so it seems to have been broken from the beginning...

@pkreissl pkreissl changed the title Combination of Thermalization & LBFluidStressTensor broken Thermalization of CPU LB broken Aug 3, 2020
@kodiakhq kodiakhq bot closed this as completed in #3847 Sep 30, 2020
kodiakhq bot added a commit that referenced this issue Sep 30, 2020
Fixes #3804 , fixes #3772

The issue was actually a regression that happened with the switch to philox in commit 
[f3cc4ba](f3cc4ba). Random numbers formerly part of the interval (-0.5,0.5] were replaced by random numers in (0,1].
With this fix the `lb_pressure_tensor_acf.py` runs successfully for both CPU and GPU. However, as @KaiSzuttor correctly mentioned in PR #3831 the CPU part of the test takes a while to execute (on my machine, single core the whole test takes 136 s). I could try to make that faster which, however, would require tweaking the tolerance limits `tol_node` and `tol_global`. @RudolfWeeber , what was your reasoning behind the chosen limits? Or are they semi-arbitrary choices?

Further, this PR corrects the comparison of the off-diagonal elements `avg_ij` vs. `avg_ji` in the test.
jngrad pushed a commit to jngrad/espresso that referenced this issue Oct 13, 2020
Fixes espressomd#3804 , fixes espressomd#3772

The issue was actually a regression that happened with the switch to philox in commit
[f3cc4ba](espressomd@f3cc4ba). Random numbers formerly part of the interval (-0.5,0.5] were replaced by random numers in (0,1].
With this fix the `lb_pressure_tensor_acf.py` runs successfully for both CPU and GPU. However, as @KaiSzuttor correctly mentioned in PR espressomd#3831 the CPU part of the test takes a while to execute (on my machine, single core the whole test takes 136 s). I could try to make that faster which, however, would require tweaking the tolerance limits `tol_node` and `tol_global`. @RudolfWeeber , what was your reasoning behind the chosen limits? Or are they semi-arbitrary choices?

Further, this PR corrects the comparison of the off-diagonal elements `avg_ij` vs. `avg_ji` in the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants