-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally assemble using multiple threads #625
Conversation
I extended your above snippet to import numpy as np
from skfem import *
from numba import jit
from skfem.helpers import dot, grad
m = MeshTet().refined(5)
basis = InteriorBasis(m, ElementTetP2())
@BilinearForm
def bilinf2(u, v, w):
return dot(grad(u), grad(v))
@jit(nogil=True, nopython=True)
def nlaplace(out, du, dv):
for i in range(du.shape[1]):
for j in range(du.shape[2]):
for k in range(du.shape[0]):
out[i, j] += du[k, i, j] * dv[k, i, j]
@BilinearForm(nthreads=4)
def bilinf1(u, v, w):
out = np.empty_like(u.grad[0])
nlaplace(out, u.grad, v.grad)
return out Then I measured execution times with In [2] : %timeit bilinf1.assemble(basis)
3.13 s ± 317 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) and another execution with the default assembly In [3] : %timeit bilinf2.assemble(basis)
4.18 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) (I'm using my Surface Pro 4 now so execution times could be a bit slow...). Therefore the execution time for this example is about 25% faster for the parallel version with 4 threads. If I use 8 or 16 threads it takes only If I try to assemble the scikit-fem-"default" I'll check my hyperelasticity problem from #616 ... |
For the hyperelasticity example in #616 the total execution time is 50s (nthreads=8) compared to 76s (no parallel assembly) for one increment with 4 newton-iterations. If I compare the execution times of the |
import numpy as np
from scipy.sparse import bmat
import skfem
from skfem.helpers import grad, det, inv
mu = 1.0
bulk = 5000.0
# Newton-Rhapson parameters
tol = 1e-8
maxiter = 8
# array with values of "External Displacement Z" on boundary "move"
move_uz = -np.arange(0,0.2,0.1)
print("Init Stage")
print("==========")
print("* Init mesh.")
m = skfem.mesh.MeshHex().refined(3)
eu = skfem.element.ElementVectorH1(skfem.element.ElementHex1())
ep = skfem.element.ElementHex0()
eJ = skfem.element.ElementHex0()
e = {
"u": eu,
"p": ep,
"J": eJ
}
print("* Init basis for (Hu-Washizu) Three-Field-Variation (u-p-J̅)")
print(" for nearly-incompressible hyperelastic materials.\n")
print(" Π_int = ∫ ̂ψ(̂F) dV + ∫ U(J̅) dV + ∫ p ((det(F) - J̅) dV\n")
basis = {"u": skfem.InteriorBasis(m,eu, intorder=1),
"p": skfem.InteriorBasis(m,ep, intorder=0),
"J": skfem.InteriorBasis(m,ep, intorder=0)
}
def transpose(A):
"Transpose of 2d and Major-transpose of 4d Arrays with trailing axes."
if len(np.shape(A))-2 == 0:
return A.T
elif len(np.shape(A))-2 == 2:
return np.transpose(A,(1,0,2,3))
elif len(np.shape(A))-2 == 4:
return np.transpose(A,(2,3,0,1,4,5))
elif len(np.shape(A))-2 > 4:
return np.transpose(A,(2,3,0,1,*range(len(A.shape))[4:]))
else:
raise NotImplementedError("Unknown shape of array.")
def identity(A):
"Identity Matrix of same shape as A with trailing axes."
if isinstance(A,np.ndarray):
ndim = A.shape[0]
a,b = A.shape[-2:]
else:
# scikit-fem
ndim = A.value.shape[0]
a,b = A.value.shape[-2:]
return np.tile(np.identity(ndim), (b,a,1,1)).T
def dya(A,B=None):
"Dyadic (outer tensor) product of two Tensors."
if B is None:
B = A
return np.einsum("ij...,kl...->ijkl...",A,B)
def cdyau(A,B=None):
"ik-crossed dyadic product of two Tensors."
if B is None:
B = A
return np.einsum("ij...,kl...->ikjl...",A,B)
def cdyal(A,B=None):
"il-crossed dyadic product of two Arrays."
if B is None:
B = A
return np.einsum("ij...,kl...->ilkj...",A,B)
def cdya(A,B=None):
"Symmetric crossed dyadic product of two Arrays."
if B is None:
B = A
return (cdyau(A,B) + cdyal(A,B)) / 2
def __dot22(A,B):
"Dot product of a 2d and a 2d Array."
return np.einsum("ij...,jk...->ik...",A,B)
def __dot42(A,B):
"Dot product of a 4d and a 2d Array."
return np.einsum("ijkl...,lm...->ijkm...",A,B)
def __dot24(A,B):
"Dot product of a 2d and a 4d Array."
return np.einsum("ij...,jklm...->iklm...",A,B)
def __dot44(A,B):
"Dot product of a 4d and a 4d Array."
return np.einsum("ijkl...,lmnp...->ijkmnp...",A,B)
def _singledot(A,B):
"Dot product of two Arrays."
if len(np.shape(A))-2 == 2 and len(np.shape(B))-2 == 4:
return __dot24(A,B)
elif len(np.shape(A))-2 == 4 and len(np.shape(B))-2 == 2:
return __dot42(A,B)
else:
return __dot22(A,B)
def dot(A,B,*args):
"Multi-argument dot product of two Arrays."
C = _singledot(A,B)
for i in range(len(args)):
C = _singledot(C,args[i])
return C
def __ddot22(A,B):
"Double dot product of a 2d and a 2d Array."
return np.einsum("ij...,ij...->...",A,B)
def __ddot24(A,B):
"Double dot product of a 2d and a 4d Array."
return np.einsum("ij...,ijkl...->kl...",A,B)
def __ddot42(A,B):
"Double dot product of a 4d and a 2d Array."
return np.einsum("ijkl...,kl...->ij...",A,B)
def __ddot44(A,B):
"Double dot product of a 4d and a 4d Array."
return np.einsum("ijkl...,klmn...->ijmn...",A,B)
def _singledoubledot(A,B):
"Double-Dot product of two Arrays."
if len(np.shape(A))-2 == 2 and len(np.shape(B))-2 == 4:
return __ddot24(A,B)
elif len(np.shape(A))-2 == 4 and len(np.shape(B))-2 == 2:
return __ddot42(A,B)
else:
return __ddot22(A,B)
def ddot(A,B,*args):
"Multi-argument double-dot product of two Arrays."
C = _singledoubledot(A,B)
for i in range(len(args)):
C = _singledoubledot(C,args[i])
return C
def _det(A):
"Determinant with optional trailing axes."
return np.linalg.det(A.T).T
def _inv(A):
"Inverse with optional trailing axes."
return np.linalg.inv(A.T).T
def trace(A):
"Trace with optional trailing axes."
return np.trace(A)
def vol(A):
"Spherical (volumetric) part of A (with optional trailing axes)."
return trace(A)/3 * identity(A)
def dev(A):
"Deviatoric part of A (with optional trailing axes)."
return A - vol(A)
def fields(w):
return w["displacements"], w["pressure"].value, w["volumeratio"].value
def F1(w):
"""1st Piola-Kirchhoff stress tensor obtained from
variation of the potential w.r.t. the displacements."""
u,p,theta = fields(w)
F = grad(u) + identity(u)
J = det(F)
Pdev = mu * (F - ddot(F,F)/3*transpose(inv(F))) * J**(-2/3)
Pvol = p * J * transpose(inv(F))
return Pdev + Pvol
def F2(w):
"""Equilibrium equation obtained from variation
of the potential w.r.t. the pressure."""
u,p,theta = fields(w)
F = grad(u) + identity(u)
J = det(F)
return J-theta
def F3(w):
"""Equilibrium equation obtained from variation
of the potential w.r.t. the volume ratio 'theta'."""
u,p,theta = fields(w)
return bulk * (theta-1) - p
def A11(w):
"""Elasticity tensor obtained from the linearization
w.r.t. the displacements of the variation of the
potential w.r.t. the displacements."""
u,p,theta = fields(w)
eye = identity(u)
F = grad(u) + eye
J = det(F)
iFT = transpose(inv(F))
A4_dev = mu * ( cdyau(eye)
-2/3 * dya(F,iFT)
-2/3 * dya(iFT,F)
+2/9 * ddot(F,F) * dya(iFT)
+1/3 * ddot(F,F) * cdyal(iFT)
) * J**(-2/3)
A4_vol = p * J * (dya(iFT) - cdyal(iFT))
return A4_dev + A4_vol
def A12(w):
"""Linearization w.r.t. the displacements of the variation of the
potential w.r.t. the pressure."""
u,p,theta = fields(w)
F = grad(u) + identity(u)
J = det(F)
return J*transpose(inv(F))
def A13(w):
"""Linearization w.r.t. the displacements of the variation of the
potential w.r.t. the volume ratio."""
u,p,theta = fields(w)
return np.zeros_like(grad(u))
def A22(w):
"""Linearization w.r.t. the pressure of the variation of the
potential w.r.t. the pressure."""
u,p,theta = fields(w)
return np.zeros_like(p)
def A23(w):
"""Linearization w.r.t. the volume ratio of the variation of the
potential w.r.t. the pressure."""
u,p,theta = fields(w)
return - np.ones_like(p)
def A33(w):
"""Linearization w.r.t. the volume ratio of the variation of the
potential w.r.t. the volume ratio."""
u,p,theta = fields(w)
return np.ones_like(theta) * bulk
print("* Init forms.")
@skfem.LinearForm(nthreads=8)
def a1(v, w):
return ddot(F1(w), grad(v))
@skfem.LinearForm
def a2(v, w):
return F2(w) * v
@skfem.LinearForm
def a3(v, w):
return F3(w) * v
@skfem.BilinearForm(nthreads=8)
def b11(u, v, w):
return ddot(grad(u), A11(w), grad(v))
@skfem.BilinearForm
def b22(u, v, w):
return A22(w) * u * v
@skfem.BilinearForm
def b33(u, v, w):
return A33(w) * u * v
@skfem.BilinearForm
def b12(u, v, w):
return ddot(A12(w), grad(v)) * u
@skfem.BilinearForm
def b13(u, v, w):
return ddot(A13(w), grad(v)) * u
@skfem.BilinearForm
def b23(u, v, w):
return A23(w) * u * v
print("* Init Boundary conditions.")
dofs_left = basis["u"].find_dofs(
{"left": m.facets_satisfying(lambda x: x[0] <= 0.0)},
skip=["u^2", "u^3"]
)
dofs_bottom = basis["u"].find_dofs(
{"bottom": m.facets_satisfying(lambda x: x[1] <= 0.0)},
skip=["u^1", "u^3"]
)
dofs_back = basis["u"].find_dofs(
{"back": m.facets_satisfying(lambda x: x[2] <= 0.0)},
skip=["u^1", "u^2"]
)
dofs_front = basis["u"].find_dofs(
{"front": m.facets_satisfying(lambda x: np.abs(x[2] - 1.0) <= 0.0)},
skip=["u^3"]
)
dofs_move = basis["u"].find_dofs(
{"move": m.facets_satisfying(lambda x: np.abs(x[2] - 1.0) <= 0.0)},
skip=["u^1", "u^2"]
)
print("* Assemble boundaries conditions.")
dofs = {}
for dof in [dofs_left,
dofs_bottom,
dofs_back,
dofs_front,
dofs_move,
]:
dofs.update(dof)
print("* Obtain degrees of freedom to keep.")
I = np.hstack((
basis["u"].complement_dofs(dofs),
basis["u"].N + np.arange(basis["p"].N),
basis["u"].N + basis["p"].N + np.arange(basis["J"].N)
))
print("* Init total and incremental unknowns.")
u = basis["u"].zeros()
p = basis["p"].zeros()
J = basis["J"].zeros()
δu = basis["u"].zeros()
δp = basis["p"].zeros()
δJ = basis["J"].zeros()
print("* Initial assembly of stiffness matrix.")
asmprops = {
"displacements": basis["u"].interpolate(u),
"pressure": basis["p"].interpolate(p),
"volumeratio": basis["J"].interpolate(J)
}
K11 = skfem.asm(b11, basis["u"], basis["u"], **asmprops)
K22 = skfem.asm(b22, basis["p"], basis["p"], **asmprops)
K33 = skfem.asm(b33, basis["J"], basis["J"], **asmprops)
K12 = skfem.asm(b12, basis["p"], basis["u"], **asmprops)
K13 = skfem.asm(b13, basis["J"], basis["u"], **asmprops)
K23 = skfem.asm(b23, basis["J"], basis["p"], **asmprops)
K = bmat(
[[K11, K12, K13],
[K12.T, K22, K23],
[K13.T, K23.T, K33]], "csr"
)
print("* Initial assembly of equilibrium equations.\n")
f1 = skfem.asm(a1, basis["u"], **asmprops)
f2 = skfem.asm(a2, basis["p"], **asmprops)
f3 = skfem.asm(a3, basis["J"], **asmprops)
f = np.concatenate((f1,f2,f3))
# init array for "Reaction Force Z" on boundary "move" for all increments
move_fz = np.zeros_like(move_uz)
# increment loop for ramping up "External Displacement Z" on "move"
for increment, move in enumerate(move_uz[1:]):
print(f"Increment {(increment+1):2d}")
print("============")
print(f'* prescribed displacement Z on boundary "move" U_Z={move:1.3e}''')
# newton-rhapson iteration loop
print("\nNewton-Rhapson iterations")
print("-------------------------")
# init converged flag
converged = False
# Newton-Rhapson iteration loop
for iteration in range(maxiter):
# enforce displacements on boundary "move"
δupJ_prescribed = np.hstack((u,p,J))
δupJ_prescribed[dofs["move"].all()] = move - δupJ_prescribed[dofs["move"].all()]
#if not np.allclose(δupJ_prescribed[dofs["move"].all()],0):
# print(f'* increase incremental displacements on boundary "move" δU_Z={δupJ_prescribed[dofs["move"].all()][0]:1.3e}\n')
# solve system
system = skfem.condense(K, -f, x=δupJ_prescribed, I=I)
uvpJ = skfem.solve(*system, use_umfpack=True)
δu, pJ = np.split(uvpJ, [u.shape[0]])
δp, δJ = np.split(pJ, [p.shape[0]])
# calculate norms of unknowns
norm_δu = np.linalg.norm(δu)
norm_δp = np.linalg.norm(δp)
norm_δJ = np.linalg.norm(δJ)
# check if solution is valid
if np.isnan(norm_δu) or np.isnan(norm_δp) or np.isnan(norm_δJ):
break
# update unknowns
u += δu
p += δp
J += δJ
# keyword arguments for assembly
asmprops = {
"displacements": basis["u"].interpolate(u),
"pressure": basis["p"].interpolate(p),
"volumeratio": basis["J"].interpolate(J)
}
# update equilibrium equations
f1 = skfem.asm(a1, basis["u"], **asmprops)
f2 = skfem.asm(a2, basis["p"], **asmprops)
f3 = skfem.asm(a3, basis["J"], **asmprops)
f = np.concatenate((f1,f2,f3))
# get active displacement dofs
dof1 = basis["u"].complement_dofs(dofs)
# reference force as norm of nodals reaction forces at prescribed dofs
f1_ref = np.linalg.norm(np.delete(f1,dof1))
# check nodal equilibrium for nodal force residuals at active dofs
norm_f1 = np.linalg.norm(f1[dof1]) / f1_ref
# update stiffness
K11 = skfem.asm(b11, basis["u"], basis["u"], **asmprops)
K22 = skfem.asm(b22, basis["p"], basis["p"], **asmprops)
K33 = skfem.asm(b33, basis["J"], basis["J"], **asmprops)
K12 = skfem.asm(b12, basis["p"], basis["u"], **asmprops)
K13 = skfem.asm(b13, basis["J"], basis["u"], **asmprops)
K23 = skfem.asm(b23, basis["J"], basis["p"], **asmprops)
# build sparse stiffness matrix from sparse sub-blocks
K = bmat(
[[K11, K12, K13],
[K12.T, K22, K23],
[K13.T, K23.T, K33]], "csr"
)
print(f"#{iteration+1:2d}: |f|={norm_f1:1.3e} (|δu|={norm_δu:1.3e} |δp|={norm_δp:1.3e} |δJ|={norm_δJ:1.3e})")
if norm_f1 < tol:
converged = True
move_fz[increment+1] = np.sum(f1[dofs["move"].all()])
print(f'\n* final "Reaction Force Z" on boundary "move" F_Z={move_fz[increment+1]:1.3e}\n')
break
if np.isnan(norm_δu) or not converged:
# reset counter for last converged increment and break
increment = increment - 1
break
else:
# export results to xdmf-file and go to next increment
savedata = {"u": u[basis["u"].nodal_dofs].T,
#"p": p[basis["p"].element_dofs[0]],
#"J": J[basis["J"].element_dofs[0]]
}
``` |
Beside the improvements in execution time the commits look good. Thank you for your efforts! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure about the test-case but the code changes / extensions are fine. Thanks!
The tests are basically simple proofs of correctness of the implementation, by comparing |
Are you saying that the improvement in execution time is not as good as when using joblib? Maybe there is some room for improvement here. Edit: I mean, "8.9s in parallel vs. 13.7s in default mode." is about 1.5x improvement vs. in #450 you said "about 2x" improvement. |
Could the issue be that my variant passes all the input data to all the threads whereas your joblib variant passes only the data that the specific thread needs? |
Also, I think the joblib version always creates as many threads as there are entries in the local stiffness matrix? |
I think here the best performance can be obtained by verifying that the input data gets split evenly to the threads, so in this Laplace example the local stiffness matrix has 100 entries so dividing 100 / nthreads should be an integer. |
As discussed in #450. Here is a snippet that I used to verify the performance gain: