-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lattice_dyson_g0_wk hangs when inside MPI #48
Comments
Staring at the C++, source code, it appears that the jankiest fix would be to do the following g_fk_t lattice_dyson_g0_fk_nompi(double mu, e_k_cvt e_k, mesh::refreq mesh, double delta) {
auto I = nda::eye<ek_vt::scalar_t>(e_k.target_shape()[0]);
g_fk_t g0_fk({mesh, e_k.mesh()}, e_k.target_shape());
g0_fk() = 0.0;
std::complex<double> delta(0.0, delta);
auto arr = g0_fk.mesh();
#pragma omp parallel for
for (int idx = 0; idx < arr.size(); idx++) {
auto &[f, k] = arr[idx];
g0_fk[f, k] = inverse((f + idelta + mu) * I - e_k[k]);
}
return g0_fk;
}
g_wk_t lattice_dyson_g0_wk_nompi(double mu, e_k_cvt e_k, mesh::imfreq mesh) {
return lattice_dyson_g0_Xk_nompi<g_wk_t, mesh::imfreq>(mu, e_k, mesh);
}
|
Dear Andrew, Thank you for the bug report. I am not sure what is going wrong but I think it is an interference between the TPRF mpi behaviour and your MPI use for scattering parameters to the MPI ranks. Please note that the MPI parallellization in TPRF assumes that all ranks call the I think both of these reqirements are violated by the example. First the hopping is not the same for all ranks since they work on different points in the meshgrid. Second I think that if the number of elements are not evenly divided by the number of MPI ranks there will be a subset of ranks that have a different length of the Could you please try if this simplified example also hangs? import numpy as np
from triqs_tprf.tight_binding import TBLattice
from triqs_tprf.lattice import *
from triqs_tprf.lattice_utils import *
from triqs.gf import *
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if True:
t = 0.1
H = TBLattice(
units = [(1, 0, 0), (0, 1, 0)],
hopping = {
# nearest neighbour hopping -t
( 0,+1): -t * np.eye(2),
( 0,-1): -t * np.eye(2),
(+1, 0): -t * np.eye(2),
(-1, 0): -t * np.eye(2),
},
orbital_positions = [(0,0,0)]*2,
orbital_names = ['up', 'do'],
)
e_k = H.fourier(H.get_kmesh(n_k=(60, 60, 1)))
DLRmesh = MeshDLRImFreq(beta=100, statistic='Fermion', w_max=10, eps=1e-14)
G_c_wk = lattice_dyson_g0_wk(mu=0.0, e_k=e_k, mesh=DLRmesh)
print("Finished for loop iteration")
print("FINISHED RUNNING ?") Best, Hugo |
Dear Hugo, Thank you for a quick response. The example you provided does not hang, although it also does not accomplish what I am hoping to achieve. I would like to iterate over points in parameter space (such as density and hopping say) and then perform calculations with the lattice Green's function. I need the outer MPI loop to do that (not present in your example) and the MPI inside the function seems to interfere with that outer loop. |
I'm currently running into problems where sometimes this function seems to hang when called many times with MPI.
Inside a bash script for example, I run the following commands (hoping to get OpenMP not to interfere with a MPI loop)
where a minimal working example is
which hangs and never completes. I need to perform operations (like saving data) after this loop, so need to understand how to prevent this hanging.
(this issue also occurs without DLR mesh)
by the way,
I also get a warning when calling
lattice_dyson_g0_wk
which is probably also worth fixing.DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
The text was updated successfully, but these errors were encountered: