Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InteriorBasis slow for Hex1 #282

Closed
gdmcbain opened this issue Dec 12, 2019 · 22 comments · Fixed by #283
Closed

InteriorBasis slow for Hex1 #282

gdmcbain opened this issue Dec 12, 2019 · 22 comments · Fixed by #283

Comments

@gdmcbain
Copy link
Contributor

While working with MeshHex on #279 and #281, I noticed that InteriorBasis seemed slow for MeshHex and ElementHex1.

I solved a Laplace equation (like ex13) and this was the bottleneck; e.g. for mesh.p.shape[1] == basis.N == 34001:

  • load: 0.6 s
  • InteriorBasis: 82.5 s
  • asm: 8.2s
  • preconditioner: 0.05 s (pyamgcl.amgcl)
  • solve: 0.36 s (solver_iter_pcg)

So the algebraic multigrid #264 is great and assembly is O. K. but building the basis is slow.

I've started to quantify this by comparing the time for InteriorBasis with MeshTet and ElementTetP1 using .init_tensor with the same points (and so same basis.N but with five times as many tetrahedral elements).

 from pathlib import Path
from time import time

from matplotlib.pyplot import subplots
import numpy as np

import skfem

number = []
tetrahedral = []
hexahedral = []
for n in range(2, 15):
    times = []
    points = [np.arange(n)]*3
    for cls, elt in [(skfem.MeshTet, skfem.ElementTetP1()),
                     (skfem.MeshHex, skfem.ElementHex1())]:
        mesh = cls.init_tensor(*points)
        tic = time()
        basis = skfem.InteriorBasis(mesh, elt)
        times.append(time() - tic)
    number.append(basis.N)
    tetrahedral.append(times[-2])
    hexahedral.append(times[-1])

fig, ax = subplots()
fig.suptitle('InteriorBasis')
ax.loglog(number, hexahedral, linestyle='none', marker='s',
        label=skfem.MeshHex.name)
ax.loglog(number, tetrahedral, linestyle='none', marker='^',
        label=skfem.MeshTet.name)
ax.set_xlabel('basis.N')
ax.set_ylabel('time / s')
ax.legend()
fig.savefig(Path(__file__).with_suffix('.png'))

The hexahedral case is a hundred times slower.

hex_basis

@kinnala
Copy link
Owner

kinnala commented Dec 13, 2019 via email

@gdmcbain
Copy link
Contributor Author

Good idea. I'll start with that on Monday.

@gdmcbain
Copy link
Contributor Author

That helps a bit, but even in reducing to intorder=6 to 2 to match ElementTetP1, the hexahedral case is still markedly slower. This is even though, for the same name of mesh points (.mesh.p.shape[1] or .N) the hexahedral case has a fifth less quadrature points (a fifth .nelems, twice .Nbfun, twice len(W))

hex_basis

@gdmcbain
Copy link
Contributor Author

The issue isn't strictly just ElementHex1; similarly ElementQuad1 is noticeably slower that ElementTriP1 for the same .N. For the five basic linear meshes with 2¹⁸ = (2⁹)² = (2⁶)³ points and intorder=2:

name N Nbfun (ndim, nelems, len(W)) t/s
One-dimensional 262144 2 (1, 262143, 2) 0.15467357635498047
Triangular 262144 3 (2, 522242, 3) 2.0763657093048096
Quadrilateral 262144 4 (2, 261121, 4) 5.525990009307861
Tetrahedral 262144 4 (3, 1250235, 4) 7.537903785705566
Hexahedral 262144 8 (3, 250047, 8) 95.00305962562561

@kinnala
Copy link
Owner

kinnala commented Dec 16, 2019 via email

@gdmcbain
Copy link
Contributor Author

Yes.

cProfile reveals that the issue is MappingIsoparametric, as used by MeshQuad and MeshHex and not MeshLine, MeshTri, or MeshTet.

Of the 95.0 s to build the hexahedral InteriorBasis, 88.0 s are spent in MappingIsoParametric.J.

@kinnala
Copy link
Owner

kinnala commented Dec 16, 2019 via email

@gdmcbain
Copy link
Contributor Author

Yes.

This is 431 calls to MappingIsoparametric.J from MappingIsoparametric.invDF and MappingIsoparametric.detDF.

I wonder whether the values returned by J can be cached? That is, the values of the Jacobian on the quadrature points.

@kinnala
Copy link
Owner

kinnala commented Dec 16, 2019 via email

@gdmcbain
Copy link
Contributor Author

I'm looking into whether it would be possible to use numpy.linalg.tensorinv instead of Cramer's rule.

@gdmcbain
Copy link
Contributor Author

gdmcbain commented Dec 16, 2019

Or since with numpy.linalg.inv

Inverses of several matrices can be computed at once

something like

np.linalg.inv(jac.transpose(2, 3, 0, 1)).transpose(2, 3, 0, 1)

@gdmcbain
Copy link
Contributor Author

That's (a7a129f) much more concise for MappingIsoparametric.invDF but it doesn't make much difference to the speed.

It also fails six tests!

@gdmcbain
Copy link
Contributor Author

Another permutation of the retranposition (be1e652) worked better. (Passing the tests, not speeding anything).

@gdmcbain
Copy link
Contributor Author

The vectorized version also has the advantage of being independent of MappingIsoparametric.dim.

@kinnala
Copy link
Owner

kinnala commented Dec 16, 2019

Great. Let's dig further into it.

It seems to me also that both of these Mapping classes need some additional love, possibly partial rewrite to get rid of these dimension-dependent parts and use more np.einsum.

@gdmcbain
Copy link
Contributor Author

Unfortunately the vectorized reimplementation be1e652 of MappingIsoparametric.invDF is actually slower. Back on the original MeshHex with .p.shape[1] == 34001, there are 8 calls to invDF and collectively these take:

  • 14.2 s on 282-inv-jac at be1e652 using numpy.linalg.inv (which accounts for 12.4 s in 8 calls)
  • 10.4 s on master at 7f7a95b using Cramer's rule

This is despite the time spent in MappingIsoparametric.J being reduced from 10.6 s (446 calls) to 2.5 s (102 calls).

@gdmcbain
Copy link
Contributor Author

gdmcbain commented Dec 17, 2019

It seems that for most of the common cases, all the rows of .basis[i][0] are the same.

@gdmcbain
Copy link
Contributor Author

…in the scalar case, by which I mean, e.g.

all(np.allclose(ex01.basis.basis[i][0], ex01.e.lbasis(ex01.basis.X, i)[0]) for i in range(ex01.basis.Nbfun))

@gdmcbain
Copy link
Contributor Author

…even when the mapping is isoparametric and the elements aren't congruent, e.g. an irregular MeshQuad from pygmsh

all(np.allclose(ex17.basis.basis[i][0], 
                       ex17.element.lbasis(ex17.basis.X, i)[0]) 
     for i in range(ex17.basis.Nbfun))

@gdmcbain
Copy link
Contributor Author

Ah, I see, this is expected and elegantly handled by the use of numpy.broadcast_to in ElementH1.gbasis:

if len(X.shape) == 2:
return np.broadcast_to(phi, (invDF.shape[2], invDF.shape[3])),\
np.einsum('ijkl,il->jkl', invDF, dphi)
elif len(X.shape) == 3:
return np.broadcast_to(phi, (invDF.shape[2], invDF.shape[3])),\
np.einsum('ijkl,ikl->jkl', invDF, dphi)

@kinnala
Copy link
Owner

kinnala commented Dec 17, 2019

I made some ugly optimizations and got to this:
geordie
I don't know if it's any better what you got. Going to continue later.

@gdmcbain
Copy link
Contributor Author

Ha, yes, it's much better. My ‘optimization’ though not ugly made the code slower. This looks like roughly a factor of ten faster than master. I had been hoping to exploit the redundancy of phi between elements but see that that's already accounted for. I haven't looked as closely at dphi yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants