Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Magnetic reflectivity py #131

Merged
merged 69 commits into from
Sep 24, 2021
Merged

Magnetic reflectivity py #131

merged 69 commits into from
Sep 24, 2021

Conversation

bmaranville
Copy link
Member

This was an interesting experiment... I copied our magnetic_amplitude code from C back into python (almost verbatim) and compiled it with numba (Ahead Of Time compile)

The resulting calculations of a real model are around 20% faster with numba than the C extension: (the first number is for the C extension, the second for the numba AOT extension)

(refl1d-3.8) brian@cheery:~/dev/refl1d-modelbuilder/helical_simulation$ time refl1d  ~/Downloads/Combined_Simple_2Gap_Top.txt --store=path --time_model --steps=1000
time per model eval: 41.836 ms

real	0m43.111s
user	0m43.218s
sys	0m0.314s
(refl1d-3.8) brian@cheery:~/dev/refl1d-modelbuilder/helical_simulation$ time refl1d  ~/Downloads/Combined_Simple_2Gap_Top.txt --store=path --time_model --steps=1000
loaded from compiled module
time per model eval: 34.6037 ms

real	0m36.062s
user	0m36.163s
sys	0m0.335s

The code is not really ready to use - the importing or compiling of the module dance is a bit awkward as written.

The optimization of just adding a fixed offset when all the theta_offsets are shared
in a PNP object does not work, particularly if the data needs to be oversampled.

This commit removes the optimized version which doesn't work, and instead adds an
"oversampling" argument to the init of PolarizedNeutronProbe.

If the theta_offsets change (if they are a fit parameter) the measurement_union and
the resampling are recalculated.  This is slow but there may be no way around this
without a mesh of theta/lambda values to interpolate from.
#print "amplitude", depth, rho, kz, rho_index
#print depth.shape, sigma.shape, rho.shape, irho.shape, kz.shape
# print "amplitude", depth, rho, kz, rho_index
# print depth.shape, sigma.shape, rho.shape, irho.shape, kz.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal style: I use no spaces when commenting out code vs. one space when writing comments about code. Black will of course ignore me and put a space in there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto code formatter in VS code puts in the space... and if we are going to try to follow pep8 I think it's too much work to try to fight both battles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My solution would be to avoid the vscode formatter since it has already demonstrated in other contexts that it doesn't know how to format python code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can suggest an alternative formatter to use in vscode I would be happy to use it, but when formatting thousands of lines we can't really do it by hand. I just checked and what I'm using now is "autopep8". I also have "black" and "yapf".
I think some of the earlier issues I had with it were user error (on my part). If you can find errors in the many "pep8" commits that I just pushed I will adjust my settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My earlier point was that it is unsustainable to try to maintain a style that is not automatically respected by autoformatters (or automatically enforced by them). If we are going to enforce style, we have to rely on the tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main tool I've used on other projects is lint, with periodic manual cleanup of lint errors. We had a discussion on the sasmodels repo about a tool to check whether a commit increased the lint count and flag those that do. Seems mostly doable, but we didn't act on it.

Regarding auto-formatters, PEP 8 was intended to be prescriptive not descriptive. Code "readability" trumps formatting consistency. When vscode breaks a line in the middle of an expression rather than breaking at a comma that hurts readability. If you would like to use a code formatter please look for one that knows how to format code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see settings lets you choose between black, autopep8 and yapf. I haven't tried any of them.

# kz = -kz
# depth, rho, irho, sigma, rhoM, thetaM = [v[::-1] for v in (depth, rho, irho, sigma, rhoM, thetaM)]
depth, rho, irho, sigma = [_dense(a, 'd')
for a in (depth, rho, irho, sigma)]
Copy link
Member

@pkienzle pkienzle Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe indent as:

LHS = [
    item for a in list]

or if item or list is a long expression, then

LHS = [
    item
    for a in list
]

I find this makes for less work when changing LHS target.

refl1d/reflectivity.py Outdated Show resolved Hide resolved
YD[i] = (B23*B42-B43*B22)/DETW # --

@cc.export('mag_amplitude', 'void(f8[:], f8[:], f8[:], f8[:], f8[:], c16[:], c16[:], f8, f8[:], i4[:], c16[:], c16[:], c16[:], c16[:])')
@njit(parallel=True, cache=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the type signature with the njit rather than the cc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that this would work!

B2SLD = 2.31604654 # Scattering factor for B field 1e-6

from numba.pycc import CC
cc = CC('magnetic_amplitude')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to do AOT compile rather than simply shipping numba and letting it use whatever optimization is available for the platform it is running on? Sounds like less work to me. Startup times will be slightly slower, but not an issue when fitting since the compile only needs to happen once. Or maybe once per process: I don't know how numba interacts with multiprocessing.

Note that numba is able to parallelize across cores and sockets on the same machine, so it may be faster to run bumps single threaded with the nllf running in parallel rather than running the optimizer in parallel with the nllf single threaded. Or maybe fewer threads than cores to allow some overlap during the non-parallel bits while reducing contention during the parallel bits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The startup times were pretty long when using njit without AOT... 10+ seconds on my machine. There are probably things that can be done to speed this up. I found that for whatever reason the njit code was slightly slower than the AOT code, though that's not supposed to be true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching should take care of that. You are doing this by hand anyway, so let numba take care of it. It seems to work for me. It took 36s to run check_examples.py with njit alone and 38s to run with precompiled, in both cases having updated the cache prior to timing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that it doesn't cache between starts of refl1d - it recompiles every time the program is started, which is potentially very annoying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That hasn't been my experience (mac conda, python=3.8, numba=0.53.1). If you touch the file it'll recompile, but otherwise it should use the cache. Any chance you are touching the .py file between runs? [obviously not, but I have to ask...]

There are some issues on this from two years ago. Perhaps they weren't resolved?

Are there any warnings showing up if you run from the command line?

The docs tell you how to set the cache directory. Can you check that the cache is being created?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct - the cache seems to be working. I think I was thrown off because I was editing some other parts of the source file (not in the numba functions) and I didn't think that would affect the cache, but it does.

refl1d/reflectivity.py Outdated Show resolved Hide resolved
refl1d/reflectivity.py Outdated Show resolved Hide resolved
@bmaranville
Copy link
Member Author

I had to fix up zeeman_check.py, but the new numba calculation seems to check out compared to the canonical gepore result:
image

@pkienzle
Copy link
Member

Sometimes shorter code can help the compiler and processor. I saved a little time (5%) by simplifying the B initialization, using the following:

    Z = 0.
    S1LP = -sqrt(complex(PI4*(RHO[L]+RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
    S3LP = -sqrt(complex(PI4*(RHO[L]-RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
    BLP = GLP = 0j
    B11 = B22 = B33 = B44 = 1. + 0j
    B12 = B13 = B14 = B21 = B23 = B24 = B31 = B32 = B34 = B41 = B42 = B43 = 0j
    for I in range(0,N-1):
        ...

Note: I didn't check that the result was correct; I only wanted to see if it would be faster.

This may be worth doing to make the code easier to maintain. Fewer repeated expressions. I'll let you decide.

@pkienzle
Copy link
Member

pkienzle commented Jul 21, 2021

Please change dtype='complex128' to dtype=np.complex128. Numba was complaining on one of my environments when it uses the string version.

Or we can return a tuple instead of passing in a numpy output array:

CR4XA_SIG = 'UniTuple(c16,4)(i8, f8[:], f8[:], f8, f8[:], f8[:], f8[:], c16[:], c16[:], f8)'

Then use code like the following to receive it:

Ra[i], Rb[i], _, _ = Cr4xa(...)

etc.

No difference in performance.

@pkienzle
Copy link
Member

I seem to be in micro-optimization mode.

It's probably too small to notice, but the following should be slightly faster:

MAGAMP_SIG = 'c16[:,:](f8[:], f8[:], f8[:], f8[:], f8[:], c16[:], c16[:], f8[:])'
@njit(MAGAMP_SIG, parallel=True, cache=True)
def magnetic_amplitude_py(d, sigma, rho, irho, rhoM, u1, u3, KZ):
    R = np.empty((len(points), 4), dtype=np.complex128)
    for i in prange(points):
        Cr4xa(layers, d, sigma, 1.0, rho, irho, rhoM, u1, u3, KZ[i], R[i])
    # If magnetism in incident or backing media then recompute with IP=-1.0
    # assigning to R[2] and R[3] only. This must come after the calculation for IP=1.0
    # which assigns to all R[0], R[1], R[2], R[3].
    if abs(rhoM[0]) <= MINIMAL_RHO_M and abs(rhoM[layers-1]) <= MINIMAL_RHO_M:
        for i in prange(points):
            Cr4xa(layers, d, sigma, -1.0, rho, irho, rhoM, u1, u3, KZ[i], R[i])
    return R

modifying CR4XA to only assign Y[0], Y[1] when IP > 0.

This avoids memory thrashing and copying associated with Y, allocating the entire return vector in one step.

@bmaranville
Copy link
Member Author

Sometimes shorter code can help the compiler and processor. I saved a little time (5%) by simplifying the B initialization, using the following:

    Z = 0.
    S1LP = -sqrt(complex(PI4*(RHO[L]+RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
    S3LP = -sqrt(complex(PI4*(RHO[L]-RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
    BLP = GLP = 0j
    B11 = B22 = B33 = B44 = 1. + 0j
    B12 = B13 = B14 = B21 = B23 = B24 = B31 = B32 = B34 = B41 = B42 = B43 = 0j
    for I in range(0,N-1):
        ...

Note: I didn't check that the result was correct; I only wanted to see if it would be faster.

This may be worth doing to make the code easier to maintain. Fewer repeated expressions. I'll let you decide.

Sure! Do we have a good test for magnetic reflectivity calculations, besides zeeman_check (which is not set up to be a CI test, as far as I can tell)?

@pkienzle
Copy link
Member

I run the sample models before releasing but only as a smoke test; I don't look at the results.

This is done using python check_examples.py --chisq.

It wouldn't take too much to turn this into a CI test. Just check if the chisq_str has changed. Meanwhile, you can do this manually using python run doc/examples/spinvalve/n101G.py --chisq before and after code mods.

@bmaranville
Copy link
Member Author

The numba convolve gaussian seems to perform almost exactly the same as the (nearly identical) C code from which it was copied. Not very surprising.

@bmaranville
Copy link
Member Author

@pkienzle this branch now runs and passes all tests without compiling reflmodule.
These have all been ported to numba:

  • convolve, convolve_uniform, convolve_gaussian, convolve_sampled
  • contract_profile (incl. align_magnetic)
  • rebin, rebin2d
  • reflamp, magnetic_amplitude

The setup.py has been modified so it does not try to build binary extension. It builds a wheel with extension py3-none-any (universal python3 wheel)

All the chisq tests return the same number as in the reflmodule.so branch, except for doc/examples/freemag/pmf.py
I haven't figured out why that one is slightly different.

Copy link
Member

@pkienzle pkienzle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify the sign on the complex part of the refractive index for the magnetic version, which at a glance looks inconsistent between the numba and the C code.

We should add the following to setup.py python_requires='>=3.8',, or make it 3.7 and add 3.7 to the CI tests.

Rename reflmodule.py to something that won't conflict with the old .so file for those of us running an "inplace" version.

Address the minor comments as you see fit and feel free to pull.

- { os: ubuntu-latest, py: 3.8, doc: 1 }
- { os: windows-latest, py: 3.6, whl: 1 }
- { os: windows-latest, py: 3.7, whl: 1 }
- { os: ubuntu-latest, py: 3.8, doc: 1, whl: 1 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 3.8 now required? Numpy is sitting at 3.7 as the oldest version, so ideally we would match that (though not important).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was using 3.8 as the target for the dataclasses branch, but I don't remember why. I don't think it's important - 3.7 is fine.



#@numba.njit(ALIGN_MAGNETIC_SIG, parallel=False, cache=True)
@numba.njit(cache=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no function signature?

If it is because the inputs are not of the correct time and we are relying on conversion during call, then I would prefer to trigger an error here and force the caller to produce the correct types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some of these there was explicit polymorphism encoded in the wrapper, and it seemed like a reasonable policy in numba to address that by not specifying a signature. I did some testing and was not able to see any performance difference between signature/no signature. For this function I suspect the usage of the variables is pretty unambiguous to numba.


#@numba.njit(CONTRACT_MAG_SIG, parallel=False, cache=True)
@numba.njit(cache=True)
def contract_mag(d, sigma, rho, irho, rhoM, thetaM, dA):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this code is broken and the magnetic folk always run with dA=0. I've added a ticket.

@@ -0,0 +1,174 @@
import numba
import numba.experimental
import math
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebin is only used for the pulsed instrument simulation code. In practice this is dead code and should be removed. Maybe drop rebin.py and put a NotImplemented error in the simulation code. No need to do so immediately. I've added a ticket.

Note: it may be worth comparing the performance of this numba code to the numpy code we have in reductus.

refl1d/rebin.py Outdated
from . import reflmodule as _cmodule

from . import reflmodule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint complains about spaces at the ends of lines.

refl1d/probe.py Show resolved Hide resolved
#
S1L = -sqrt(complex(PI4*(RHO[L]+RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
S3L = -sqrt(complex(PI4*(RHO[L]-RHOM[L]) -
E0, -PI4*(fabs(IRHO[L])+EPS)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break after comma to keep expressions together.

Why does this code use -PI4... while the code in lib/magnetic.cc was changed in this PR to use +PI4...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent question, and I wish I had written down what I was doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was testing the sign conventions, and never reverted the .cc file because it was no longer being used. The negative signs in this expression are designed to explicitly choose the branch for the sqrt. There might be a better way to do this.


sld_b, u1, u3 = calculate_u1_u3(H, rhoM, thetaM, Aguide)
sld_b, u1, u3 = calculate_u1_u3_py(H, rhoM, thetaM, Aguide)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using the numba version?

Note: could move the ..._py versions to tests/refl1d if we only need them for testing. If we want to make numba optional I suggest moving the numpy version to the same file as the numba version and conditionally return one or the other.

Or are we sure now that numba runs without difficulty on all the platforms that we care about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oversight - I switched to the pure python version before the numba version was ready, then never switched back.

refl1d/reflmodule.py Outdated Show resolved Hide resolved
# Trigger reflectivity calculation even if there is no data to
# compare against so that we can profile simulation code, and
# so that simulation smoke tests are run more thoroughly.
QR = self.reflectivity()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an important change, outside the refactor with numba... I hope this is in the other branches too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super important. In the past I've forced the call directly within the model file, but this didn't work when profiling.

@@ -59,6 +59,7 @@
('polymer', 'Polymer models'),
('probe', 'Instrument probe'),
('profile', 'Model profile'),
('refllib', 'Low level reflectivity calculations'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will pick up the docstrings from the functions defined in refllib but no module level docstrings. This may be good enough for our purposes and we don't need to figure out how to document subpackages "properly". We might drop a comment here to that effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add all the functions to an all list in refllib.py in order for them to get picked up. I agree we can figure out how to improve the docstrings in the future. When you say "module-level docstrings" I'm not sure what you mean - I added a docstring to the top of refllib.py and it seems to have picked that up (it's pretty minimal). If you have something in mind for a comment/todo to add to the file please feel free to add!

@bmaranville bmaranville merged commit c08297b into master Sep 24, 2021
@pkienzle pkienzle deleted the magnetic_reflectivity_py branch May 3, 2022 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants