Magnetic reflectivity py #131

bmaranville · 2021-07-09T02:10:29Z

This was an interesting experiment... I copied our magnetic_amplitude code from C back into python (almost verbatim) and compiled it with numba (Ahead Of Time compile)

The resulting calculations of a real model are around 20% faster with numba than the C extension: (the first number is for the C extension, the second for the numba AOT extension)

(refl1d-3.8) brian@cheery:~/dev/refl1d-modelbuilder/helical_simulation$ time refl1d  ~/Downloads/Combined_Simple_2Gap_Top.txt --store=path --time_model --steps=1000
time per model eval: 41.836 ms

real	0m43.111s
user	0m43.218s
sys	0m0.314s
(refl1d-3.8) brian@cheery:~/dev/refl1d-modelbuilder/helical_simulation$ time refl1d  ~/Downloads/Combined_Simple_2Gap_Top.txt --store=path --time_model --steps=1000
loaded from compiled module
time per model eval: 34.6037 ms

real	0m36.062s
user	0m36.163s
sys	0m0.335s

The code is not really ready to use - the importing or compiling of the module dance is a bit awkward as written.

The optimization of just adding a fixed offset when all the theta_offsets are shared in a PNP object does not work, particularly if the data needs to be oversampled. This commit removes the optimized version which doesn't work, and instead adds an "oversampling" argument to the init of PolarizedNeutronProbe. If the theta_offsets change (if they are a fit parameter) the measurement_union and the resampling are recalculated. This is slow but there may be no way around this without a mesh of theta/lambda values to interpolate from.

pkienzle · 2021-07-09T15:11:58Z

refl1d/reflectivity.py

-    #print "amplitude", depth, rho, kz, rho_index
-    #print depth.shape, sigma.shape, rho.shape, irho.shape, kz.shape
+    # print "amplitude", depth, rho, kz, rho_index
+    # print depth.shape, sigma.shape, rho.shape, irho.shape, kz.shape


Personal style: I use no spaces when commenting out code vs. one space when writing comments about code. Black will of course ignore me and put a space in there.

The auto code formatter in VS code puts in the space... and if we are going to try to follow pep8 I think it's too much work to try to fight both battles.

My solution would be to avoid the vscode formatter since it has already demonstrated in other contexts that it doesn't know how to format python code.

If you can suggest an alternative formatter to use in vscode I would be happy to use it, but when formatting thousands of lines we can't really do it by hand. I just checked and what I'm using now is "autopep8". I also have "black" and "yapf".
I think some of the earlier issues I had with it were user error (on my part). If you can find errors in the many "pep8" commits that I just pushed I will adjust my settings.

My earlier point was that it is unsustainable to try to maintain a style that is not automatically respected by autoformatters (or automatically enforced by them). If we are going to enforce style, we have to rely on the tools.

The main tool I've used on other projects is lint, with periodic manual cleanup of lint errors. We had a discussion on the sasmodels repo about a tool to check whether a commit increased the lint count and flag those that do. Seems mostly doable, but we didn't act on it.

Regarding auto-formatters, PEP 8 was intended to be prescriptive not descriptive. Code "readability" trumps formatting consistency. When vscode breaks a line in the middle of an expression rather than breaking at a comma that hurts readability. If you would like to use a code formatter please look for one that knows how to format code.

I see settings lets you choose between black, autopep8 and yapf. I haven't tried any of them.

pkienzle · 2021-07-09T15:12:02Z

refl1d/reflectivity.py

+    # kz = -kz
+    # depth, rho, irho, sigma, rhoM, thetaM = [v[::-1] for v in (depth, rho, irho, sigma, rhoM, thetaM)]
+    depth, rho, irho, sigma = [_dense(a, 'd')
+                                      for a in (depth, rho, irho, sigma)]


Maybe indent as:

LHS = [ item for a in list]

or if item or list is a long expression, then

LHS = [ item for a in list ]

I find this makes for less work when changing LHS target.

refl1d/reflectivity.py

pkienzle · 2021-07-09T15:26:00Z

refl1d/reflectivity.py

+    YD[i] = (B23*B42-B43*B22)/DETW # --
+
+@cc.export('mag_amplitude', 'void(f8[:], f8[:], f8[:], f8[:], f8[:], c16[:], c16[:], f8, f8[:], i4[:], c16[:], c16[:], c16[:], c16[:])')
+@njit(parallel=True, cache=True)


Put the type signature with the njit rather than the cc.

I wasn't aware that this would work!

pkienzle · 2021-07-09T15:34:08Z

refl1d/reflectivity.py

+B2SLD = 2.31604654  # Scattering factor for B field 1e-6
+
+from numba.pycc import CC
+cc = CC('magnetic_amplitude')


Any reason to do AOT compile rather than simply shipping numba and letting it use whatever optimization is available for the platform it is running on? Sounds like less work to me. Startup times will be slightly slower, but not an issue when fitting since the compile only needs to happen once. Or maybe once per process: I don't know how numba interacts with multiprocessing.

Note that numba is able to parallelize across cores and sockets on the same machine, so it may be faster to run bumps single threaded with the nllf running in parallel rather than running the optimizer in parallel with the nllf single threaded. Or maybe fewer threads than cores to allow some overlap during the non-parallel bits while reducing contention during the parallel bits.

The startup times were pretty long when using njit without AOT... 10+ seconds on my machine. There are probably things that can be done to speed this up. I found that for whatever reason the njit code was slightly slower than the AOT code, though that's not supposed to be true.

Caching should take care of that. You are doing this by hand anyway, so let numba take care of it. It seems to work for me. It took 36s to run check_examples.py with njit alone and 38s to run with precompiled, in both cases having updated the cache prior to timing.

I found that it doesn't cache between starts of refl1d - it recompiles every time the program is started, which is potentially very annoying.

That hasn't been my experience (mac conda, python=3.8, numba=0.53.1). If you touch the file it'll recompile, but otherwise it should use the cache. Any chance you are touching the .py file between runs? [obviously not, but I have to ask...]

There are some issues on this from two years ago. Perhaps they weren't resolved?

Are there any warnings showing up if you run from the command line?

The docs tell you how to set the cache directory. Can you check that the cache is being created?

You are correct - the cache seems to be working. I think I was thrown off because I was editing some other parts of the source file (not in the numba functions) and I didn't think that would affect the cache, but it does.

refl1d/reflectivity.py

…tude

bmaranville · 2021-07-21T19:26:09Z

I had to fix up zeeman_check.py, but the new numba calculation seems to check out compared to the canonical gepore result:

pkienzle · 2021-07-21T21:46:15Z

Sometimes shorter code can help the compiler and processor. I saved a little time (5%) by simplifying the B initialization, using the following:

    Z = 0.
    S1LP = -sqrt(complex(PI4*(RHO[L]+RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
    S3LP = -sqrt(complex(PI4*(RHO[L]-RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
    BLP = GLP = 0j
    B11 = B22 = B33 = B44 = 1. + 0j
    B12 = B13 = B14 = B21 = B23 = B24 = B31 = B32 = B34 = B41 = B42 = B43 = 0j
    for I in range(0,N-1):
        ...

Note: I didn't check that the result was correct; I only wanted to see if it would be faster.

This may be worth doing to make the code easier to maintain. Fewer repeated expressions. I'll let you decide.

pkienzle · 2021-07-21T22:05:35Z

Please change dtype='complex128' to dtype=np.complex128. Numba was complaining on one of my environments when it uses the string version.

Or we can return a tuple instead of passing in a numpy output array:

CR4XA_SIG = 'UniTuple(c16,4)(i8, f8[:], f8[:], f8, f8[:], f8[:], f8[:], c16[:], c16[:], f8)'

Then use code like the following to receive it:

Ra[i], Rb[i], _, _ = Cr4xa(...)

etc.

No difference in performance.

pkienzle · 2021-07-22T02:19:19Z

I seem to be in micro-optimization mode.

It's probably too small to notice, but the following should be slightly faster:

MAGAMP_SIG = 'c16[:,:](f8[:], f8[:], f8[:], f8[:], f8[:], c16[:], c16[:], f8[:])'
@njit(MAGAMP_SIG, parallel=True, cache=True)
def magnetic_amplitude_py(d, sigma, rho, irho, rhoM, u1, u3, KZ):
    R = np.empty((len(points), 4), dtype=np.complex128)
    for i in prange(points):
        Cr4xa(layers, d, sigma, 1.0, rho, irho, rhoM, u1, u3, KZ[i], R[i])
    # If magnetism in incident or backing media then recompute with IP=-1.0
    # assigning to R[2] and R[3] only. This must come after the calculation for IP=1.0
    # which assigns to all R[0], R[1], R[2], R[3].
    if abs(rhoM[0]) <= MINIMAL_RHO_M and abs(rhoM[layers-1]) <= MINIMAL_RHO_M:
        for i in prange(points):
            Cr4xa(layers, d, sigma, -1.0, rho, irho, rhoM, u1, u3, KZ[i], R[i])
    return R

modifying CR4XA to only assign Y[0], Y[1] when IP > 0.

This avoids memory thrashing and copying associated with Y, allocating the entire return vector in one step.

bmaranville · 2021-07-22T16:49:40Z

Sometimes shorter code can help the compiler and processor. I saved a little time (5%) by simplifying the B initialization, using the following:
    Z = 0.
    S1LP = -sqrt(complex(PI4*(RHO[L]+RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
    S3LP = -sqrt(complex(PI4*(RHO[L]-RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
    BLP = GLP = 0j
    B11 = B22 = B33 = B44 = 1. + 0j
    B12 = B13 = B14 = B21 = B23 = B24 = B31 = B32 = B34 = B41 = B42 = B43 = 0j
    for I in range(0,N-1):
        ...
Note: I didn't check that the result was correct; I only wanted to see if it would be faster.

This may be worth doing to make the code easier to maintain. Fewer repeated expressions. I'll let you decide.

Sure! Do we have a good test for magnetic reflectivity calculations, besides zeeman_check (which is not set up to be a CI test, as far as I can tell)?

pkienzle · 2021-07-22T20:02:27Z

I run the sample models before releasing but only as a smoke test; I don't look at the results.

This is done using python check_examples.py --chisq.

It wouldn't take too much to turn this into a CI test. Just check if the chisq_str has changed. Meanwhile, you can do this manually using python run doc/examples/spinvalve/n101G.py --chisq before and after code mods.

bmaranville · 2021-08-13T19:06:33Z

The numba convolve gaussian seems to perform almost exactly the same as the (nearly identical) C code from which it was copied. Not very surprising.

…upport built in

…ng tests

bmaranville · 2021-09-07T21:21:03Z

@pkienzle this branch now runs and passes all tests without compiling reflmodule.
These have all been ported to numba:

convolve, convolve_uniform, convolve_gaussian, convolve_sampled
contract_profile (incl. align_magnetic)
rebin, rebin2d
reflamp, magnetic_amplitude

The setup.py has been modified so it does not try to build binary extension. It builds a wheel with extension py3-none-any (universal python3 wheel)

All the chisq tests return the same number as in the reflmodule.so branch, except for doc/examples/freemag/pmf.py
I haven't figured out why that one is slightly different.

pkienzle

Please verify the sign on the complex part of the refractive index for the magnetic version, which at a glance looks inconsistent between the numba and the C code.

We should add the following to setup.py python_requires='>=3.8',, or make it 3.7 and add 3.7 to the CI tests.

Rename reflmodule.py to something that won't conflict with the old .so file for those of us running an "inplace" version.

Address the minor comments as you see fit and feel free to pull.

pkienzle · 2021-09-20T19:08:19Z

.github/workflows/test.yml

-          - { os: ubuntu-latest, py: 3.8, doc: 1 }
-          - { os: windows-latest, py: 3.6, whl: 1 }
-          - { os: windows-latest, py: 3.7, whl: 1 }
+          - { os: ubuntu-latest, py: 3.8, doc: 1, whl: 1 }


Is 3.8 now required? Numpy is sitting at 3.7 as the oldest version, so ideally we would match that (though not important).

I think I was using 3.8 as the target for the dataclasses branch, but I don't remember why. I don't think it's important - 3.7 is fine.

pkienzle · 2021-09-20T19:13:49Z

refl1d/lib_numba/contract_profile.py

+
+
+#@numba.njit(ALIGN_MAGNETIC_SIG, parallel=False, cache=True)
+@numba.njit(cache=True)


Why no function signature?

If it is because the inputs are not of the correct time and we are relying on conversion during call, then I would prefer to trigger an error here and force the caller to produce the correct types.

for some of these there was explicit polymorphism encoded in the wrapper, and it seemed like a reasonable policy in numba to address that by not specifying a signature. I did some testing and was not able to see any performance difference between signature/no signature. For this function I suspect the usage of the variables is pretty unambiguous to numba.

pkienzle · 2021-09-20T19:15:57Z

refl1d/lib_numba/contract_profile.py

+
+#@numba.njit(CONTRACT_MAG_SIG, parallel=False, cache=True)
+@numba.njit(cache=True)
+def contract_mag(d, sigma, rho, irho, rhoM, thetaM, dA):


I believe this code is broken and the magnetic folk always run with dA=0. I've added a ticket.

pkienzle · 2021-09-20T19:29:40Z

refl1d/lib_numba/rebin.py

@@ -0,0 +1,174 @@
+import numba
+import numba.experimental
+import math


Rebin is only used for the pulsed instrument simulation code. In practice this is dead code and should be removed. Maybe drop rebin.py and put a NotImplemented error in the simulation code. No need to do so immediately. I've added a ticket.

Note: it may be worth comparing the performance of this numba code to the numpy code we have in reductus.

pkienzle · 2021-09-20T19:30:49Z

refl1d/rebin.py

-    from . import reflmodule as _cmodule
-
+    from . import reflmodule
+    


Lint complains about spaces at the ends of lines.

refl1d/probe.py

pkienzle · 2021-09-20T20:13:23Z

refl1d/lib_numba/magnetic.py

+        #
+        S1L = -sqrt(complex(PI4*(RHO[L]+RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
+        S3L = -sqrt(complex(PI4*(RHO[L]-RHOM[L]) -
+                    E0, -PI4*(fabs(IRHO[L])+EPS)))


Break after comma to keep expressions together.

Why does this code use -PI4... while the code in lib/magnetic.cc was changed in this PR to use +PI4...?

This is an excellent question, and I wish I had written down what I was doing.

I think I was testing the sign conventions, and never reverted the .cc file because it was no longer being used. The negative signs in this expression are designed to explicitly choose the branch for the sqrt. There might be a better way to do this.

pkienzle · 2021-09-20T20:18:05Z

refl1d/reflectivity.py


-    sld_b, u1, u3 = calculate_u1_u3(H, rhoM, thetaM, Aguide)
+    sld_b, u1, u3 = calculate_u1_u3_py(H, rhoM, thetaM, Aguide)


Not using the numba version?

Note: could move the ..._py versions to tests/refl1d if we only need them for testing. If we want to make numba optional I suggest moving the numpy version to the same file as the numba version and conditionally return one or the other.

Or are we sure now that numba runs without difficulty on all the platforms that we care about?

This was an oversight - I switched to the pure python version before the numba version was ready, then never switched back.

refl1d/reflmodule.py

doc/genmods.py

bmaranville · 2021-09-21T18:17:46Z

refl1d/experiment.py

+            # Trigger reflectivity calculation even if there is no data to
+            # compare against so that we can profile simulation code, and
+            # so that simulation smoke tests are run more thoroughly.
+            QR = self.reflectivity()


This seems like an important change, outside the refactor with numba... I hope this is in the other branches too

Not super important. In the past I've forced the call directly within the model file, but this didn't work when profiling.

pkienzle · 2021-09-21T20:01:45Z

doc/genmods.py

@@ -59,6 +59,7 @@
    ('polymer', 'Polymer models'),
    ('probe', 'Instrument probe'),
    ('profile', 'Model profile'),
+    ('refllib', 'Low level reflectivity calculations'),


This will pick up the docstrings from the functions defined in refllib but no module level docstrings. This may be good enough for our purposes and we don't need to figure out how to document subpackages "properly". We might drop a comment here to that effect.

I had to add all the functions to an all list in refllib.py in order for them to get picked up. I agree we can figure out how to improve the docstrings in the future. When you say "module-level docstrings" I'm not sure what you mean - I added a docstring to the top of refllib.py and it seems to have picked that up (it's pretty minimal). If you have something in mind for a comment/todo to add to the file please feel free to add!

bmaranville added 8 commits June 8, 2021 16:58

trying to make python version of magnetic amplitude calculation

922af79

working python magnetic_amplitude

f0c1bb9

add oversample method that sets oversampling

9ce32df

Merge branch 'theta_offset_fix_PNP' into magnetic_reflectivity_py

7a81a61

use numba AOT compilation to reduce startup time

4dae670

update feedback if module not found

fdc21e1

minor cleanup

5265cce

pkienzle reviewed Jul 9, 2021

View reviewed changes

Paul Kienzle and others added 5 commits July 9, 2021 14:20

use jit rather than aot for magnetic calculation

f4cc290

magnetic numba code tweaks

efba989

magnetic numba code tweaks

1659b24

numba doesn't recognize c16 as a numpy dtype, ironically

4bbd50a

update test to be compatible with newer matplotlib and magnetic_ampli…

50a0b76

…tude

use type instead of string for dtype, making numba happy

a7ca9bb

pkienzle mentioned this pull request Jul 22, 2021

Turn the refl1d examples into CI tests #134

Open

pkienzle and others added 3 commits July 30, 2021 21:25

restructure magnetic calculation to simplify memory handling

d325c99

adding numba convolve_gaussian

08d79ee

adding numba convolve_gaussian

301e8c4

bmaranville added 4 commits August 13, 2021 15:18

don't enable numba parallelization by default - use multiprocessing s…

ac6cbcd

…upport built in

remove unused method and assert statement from convolve that is faili…

e877ccc

…ng tests

remove all asserts from numba convolve

e0f4da8

handling of iteration to match c

608ad75

add lib folder to packages

e5084c2

bmaranville added 7 commits September 8, 2021 11:10

revert to reflmodule as import (now numba)

9436b3f

remove inline numba definitions, move to lib_numba

de1053a

update test with reflmodule move

4c4630e

calculate length from arrays, instead of passing in

531e1e2

lib to lib_numba

f1ce7ea

adding new numba libs, and moving existing from lib

8f9dfd6

fix signature change

1c23a4f

pkienzle reviewed Sep 20, 2021

View reviewed changes

bmaranville added 4 commits September 21, 2021 13:51

pep8 formatting

b8a9499

revert mistaken sign changes

3c00215

use numba calculate_u1_u3

f9d2a86

fix calling of calculate_u1_u3 (rhoM is modified in-place)

e1a1399

bmaranville commented Sep 21, 2021

View reviewed changes

doc/genmods.py Show resolved Hide resolved

bmaranville commented Sep 21, 2021

View reviewed changes

bmaranville added 11 commits September 21, 2021 14:28

pep8

7a1d70f

pep8

44f23f6

moving reflmodule.py to refllib.py

6ec0977

use refllib

fb10625

use refllib.py instead of reflmodule

129ef9d

remove leading underscores from refllib methods

81ea454

build documentation for refllib

7299d92

remove underscores from function names

498b6e0

missed one underscored refllib function

d765385

use python3.7 as base version

269047b

Merge branch 'master' into magnetic_reflectivity_py

f808fac

pkienzle reviewed Sep 21, 2021

View reviewed changes

bmaranville merged commit c08297b into master Sep 24, 2021

pkienzle deleted the magnetic_reflectivity_py branch May 3, 2022 20:48



		#@numba.njit(ALIGN_MAGNETIC_SIG, parallel=False, cache=True)
		@numba.njit(cache=True)

		from . import reflmodule as _cmodule

		from . import reflmodule


		sld_b, u1, u3 = calculate_u1_u3(H, rhoM, thetaM, Aguide)
		sld_b, u1, u3 = calculate_u1_u3_py(H, rhoM, thetaM, Aguide)

Magnetic reflectivity py #131

Magnetic reflectivity py #131

Conversation

bmaranville commented Jul 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkienzle Jul 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmaranville commented Jul 21, 2021

pkienzle commented Jul 21, 2021

pkienzle commented Jul 21, 2021 • edited Loading

pkienzle commented Jul 22, 2021

bmaranville commented Jul 22, 2021

pkienzle commented Jul 22, 2021

bmaranville commented Aug 13, 2021

bmaranville commented Sep 7, 2021

pkienzle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkienzle Jul 9, 2021 •

edited

Loading

pkienzle commented Jul 21, 2021 •

edited

Loading