InteriorBasis slow for Hex1 #282

gdmcbain · 2019-12-12T23:50:21Z

While working with MeshHex on #279 and #281, I noticed that InteriorBasis seemed slow for MeshHex and ElementHex1.

I solved a Laplace equation (like ex13) and this was the bottleneck; e.g. for mesh.p.shape[1] == basis.N == 34001:

load: 0.6 s
InteriorBasis: 82.5 s
asm: 8.2s
preconditioner: 0.05 s (pyamgcl.amgcl)
solve: 0.36 s (solver_iter_pcg)

So the algebraic multigrid #264 is great and assembly is O. K. but building the basis is slow.

I've started to quantify this by comparing the time for InteriorBasis with MeshTet and ElementTetP1 using .init_tensor with the same points (and so same basis.N but with five times as many tetrahedral elements).

 from pathlib import Path
from time import time

from matplotlib.pyplot import subplots
import numpy as np

import skfem

number = []
tetrahedral = []
hexahedral = []
for n in range(2, 15):
    times = []
    points = [np.arange(n)]*3
    for cls, elt in [(skfem.MeshTet, skfem.ElementTetP1()),
                     (skfem.MeshHex, skfem.ElementHex1())]:
        mesh = cls.init_tensor(*points)
        tic = time()
        basis = skfem.InteriorBasis(mesh, elt)
        times.append(time() - tic)
    number.append(basis.N)
    tetrahedral.append(times[-2])
    hexahedral.append(times[-1])

fig, ax = subplots()
fig.suptitle('InteriorBasis')
ax.loglog(number, hexahedral, linestyle='none', marker='s',
        label=skfem.MeshHex.name)
ax.loglog(number, tetrahedral, linestyle='none', marker='^',
        label=skfem.MeshTet.name)
ax.set_xlabel('basis.N')
ax.set_ylabel('time / s')
ax.legend()
fig.savefig(Path(__file__).with_suffix('.png'))

The hexahedral case is a hundred times slower.

The text was updated successfully, but these errors were encountered:

kinnala · 2019-12-13T07:12:37Z

try reducing integration order. default might be too high for most use cases.

gdmcbain · 2019-12-14T04:36:29Z

Good idea. I'll start with that on Monday.

gdmcbain · 2019-12-16T04:43:27Z

That helps a bit, but even in reducing to intorder=6 to 2 to match ElementTetP1, the hexahedral case is still markedly slower. This is even though, for the same name of mesh points (.mesh.p.shape[1] or .N) the hexahedral case has a fifth less quadrature points (a fifth .nelems, twice .Nbfun, twice len(W))

gdmcbain · 2019-12-16T04:56:24Z

The issue isn't strictly just ElementHex1; similarly ElementQuad1 is noticeably slower that ElementTriP1 for the same .N. For the five basic linear meshes with 2¹⁸ = (2⁹)² = (2⁶)³ points and intorder=2:

name	N	Nbfun	(ndim, nelems, len(W))	t/s
One-dimensional	262144	2	(1, 262143, 2)	0.15467357635498047
Triangular	262144	3	(2, 522242, 3)	2.0763657093048096
Quadrilateral	262144	4	(2, 261121, 4)	5.525990009307861
Tetrahedral	262144	4	(3, 1250235, 4)	7.537903785705566
Hexahedral	262144	8	(3, 250047, 8)	95.00305962562561

kinnala · 2019-12-16T05:10:54Z

Is this line the culprit? https://github.com/kinnala/scikit-fem/blob/48c836cd1dafcba54086428fad214f5f65083076/skfem/assembly/global_basis/interior_basis.py#L95 Could look into the possibility of parallelizing it. Not sure yet how that would happen.

gdmcbain · 2019-12-16T05:11:48Z

Yes.

cProfile reveals that the issue is MappingIsoparametric, as used by MeshQuad and MeshHex and not MeshLine, MeshTri, or MeshTet.

Of the 95.0 s to build the hexahedral InteriorBasis, 88.0 s are spent in MappingIsoParametric.J.

kinnala · 2019-12-16T05:18:15Z

This Jacobian: https://github.com/kinnala/scikit-fem/blob/48c836cd1dafcba54086428fad214f5f65083076/skfem/mapping/mapping_isoparametric.py#L65 is necessarily more complicated than in affine case because it depends on the quadrature point location on the reference element. Maybe there is room for optimization though.

gdmcbain · 2019-12-16T05:18:49Z

Yes.

This is 431 calls to MappingIsoparametric.J from MappingIsoparametric.invDF and MappingIsoparametric.detDF.

I wonder whether the values returned by J can be cached? That is, the values of the Jacobian on the quadrature points.

kinnala · 2019-12-16T05:21:20Z

Good idea. I was just looking at this: https://github.com/kinnala/scikit-fem/blob/48c836cd1dafcba54086428fad214f5f65083076/skfem/mapping/mapping_isoparametric.py#L181 Same parameters get called over and over again.

gdmcbain · 2019-12-16T05:32:23Z

I'm looking into whether it would be possible to use numpy.linalg.tensorinv instead of Cramer's rule.

gdmcbain · 2019-12-16T05:44:33Z

Or since with numpy.linalg.inv

Inverses of several matrices can be computed at once

something like

np.linalg.inv(jac.transpose(2, 3, 0, 1)).transpose(2, 3, 0, 1)

gdmcbain · 2019-12-16T06:00:06Z

That's (a7a129f) much more concise for MappingIsoparametric.invDF but it doesn't make much difference to the speed.

It also fails six tests!

gdmcbain · 2019-12-16T06:03:10Z

Another permutation of the retranposition (be1e652) worked better. (Passing the tests, not speeding anything).

gdmcbain · 2019-12-16T06:04:48Z

The vectorized version also has the advantage of being independent of MappingIsoparametric.dim.

kinnala · 2019-12-16T06:34:57Z

Great. Let's dig further into it.

It seems to me also that both of these Mapping classes need some additional love, possibly partial rewrite to get rid of these dimension-dependent parts and use more np.einsum.

gdmcbain · 2019-12-16T22:50:49Z

Unfortunately the vectorized reimplementation be1e652 of MappingIsoparametric.invDF is actually slower. Back on the original MeshHex with .p.shape[1] == 34001, there are 8 calls to invDF and collectively these take:

14.2 s on 282-inv-jac at be1e652 using numpy.linalg.inv (which accounts for 12.4 s in 8 calls)
10.4 s on master at 7f7a95b using Cramer's rule

This is despite the time spent in MappingIsoparametric.J being reduced from 10.6 s (446 calls) to 2.5 s (102 calls).

gdmcbain · 2019-12-17T19:56:03Z

It seems that for most of the common cases, all the rows of .basis[i][0] are the same.

gdmcbain · 2019-12-17T21:10:08Z

…in the scalar case, by which I mean, e.g.

all(np.allclose(ex01.basis.basis[i][0], ex01.e.lbasis(ex01.basis.X, i)[0]) for i in range(ex01.basis.Nbfun))

gdmcbain · 2019-12-17T21:20:55Z

…even when the mapping is isoparametric and the elements aren't congruent, e.g. an irregular MeshQuad from pygmsh

all(np.allclose(ex17.basis.basis[i][0], 
                       ex17.element.lbasis(ex17.basis.X, i)[0]) 
     for i in range(ex17.basis.Nbfun))

gdmcbain · 2019-12-17T21:28:59Z

Ah, I see, this is expected and elegantly handled by the use of numpy.broadcast_to in ElementH1.gbasis:

scikit-fem/skfem/element/element_h1.py

Lines 11 to 16 in 48c836c

    
           if len(X.shape) == 2: 
        
               return np.broadcast_to(phi, (invDF.shape[2], invDF.shape[3])),\ 
        
                      np.einsum('ijkl,il->jkl', invDF, dphi) 
        
           elif len(X.shape) == 3: 
        
               return np.broadcast_to(phi, (invDF.shape[2], invDF.shape[3])),\ 
        
                      np.einsum('ijkl,ikl->jkl', invDF, dphi)

kinnala · 2019-12-17T22:07:31Z

I made some ugly optimizations and got to this:

I don't know if it's any better what you got. Going to continue later.

gdmcbain · 2019-12-17T22:30:58Z

Ha, yes, it's much better. My ‘optimization’ though not ugly made the code slower. This looks like roughly a factor of ten faster than master. I had been hoping to exploit the redundancy of phi between elements but see that that's already accounted for. I haven't looked as closely at dphi yet.

kinnala mentioned this issue Dec 18, 2019

Optimize mapping isoparam by caching #283

Merged

gdmcbain closed this as completed in #283 Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InteriorBasis slow for Hex1 #282

InteriorBasis slow for Hex1 #282

gdmcbain commented Dec 12, 2019

kinnala commented Dec 13, 2019 via email

gdmcbain commented Dec 14, 2019

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 16, 2019

kinnala commented Dec 16, 2019 via email

gdmcbain commented Dec 16, 2019

kinnala commented Dec 16, 2019 via email

gdmcbain commented Dec 16, 2019

kinnala commented Dec 16, 2019 via email

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 16, 2019 •

edited

Loading

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 16, 2019

kinnala commented Dec 16, 2019

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 17, 2019 •

edited

Loading

gdmcbain commented Dec 17, 2019

gdmcbain commented Dec 17, 2019

gdmcbain commented Dec 17, 2019

kinnala commented Dec 17, 2019

gdmcbain commented Dec 17, 2019

InteriorBasis slow for Hex1 #282

InteriorBasis slow for Hex1 #282

Comments

gdmcbain commented Dec 12, 2019

kinnala commented Dec 13, 2019 via email

gdmcbain commented Dec 14, 2019

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 16, 2019

kinnala commented Dec 16, 2019 via email

gdmcbain commented Dec 16, 2019

kinnala commented Dec 16, 2019 via email

gdmcbain commented Dec 16, 2019

kinnala commented Dec 16, 2019 via email

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 16, 2019 • edited Loading

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 16, 2019

kinnala commented Dec 16, 2019

gdmcbain commented Dec 16, 2019

gdmcbain commented Dec 17, 2019 • edited Loading

gdmcbain commented Dec 17, 2019

gdmcbain commented Dec 17, 2019

gdmcbain commented Dec 17, 2019

kinnala commented Dec 17, 2019

gdmcbain commented Dec 17, 2019

gdmcbain commented Dec 16, 2019 •

edited

Loading

gdmcbain commented Dec 17, 2019 •

edited

Loading