-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Magnetic reflectivity py #131
Conversation
The optimization of just adding a fixed offset when all the theta_offsets are shared in a PNP object does not work, particularly if the data needs to be oversampled. This commit removes the optimized version which doesn't work, and instead adds an "oversampling" argument to the init of PolarizedNeutronProbe. If the theta_offsets change (if they are a fit parameter) the measurement_union and the resampling are recalculated. This is slow but there may be no way around this without a mesh of theta/lambda values to interpolate from.
#print "amplitude", depth, rho, kz, rho_index | ||
#print depth.shape, sigma.shape, rho.shape, irho.shape, kz.shape | ||
# print "amplitude", depth, rho, kz, rho_index | ||
# print depth.shape, sigma.shape, rho.shape, irho.shape, kz.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal style: I use no spaces when commenting out code vs. one space when writing comments about code. Black will of course ignore me and put a space in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auto code formatter in VS code puts in the space... and if we are going to try to follow pep8 I think it's too much work to try to fight both battles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My solution would be to avoid the vscode formatter since it has already demonstrated in other contexts that it doesn't know how to format python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can suggest an alternative formatter to use in vscode I would be happy to use it, but when formatting thousands of lines we can't really do it by hand. I just checked and what I'm using now is "autopep8". I also have "black" and "yapf".
I think some of the earlier issues I had with it were user error (on my part). If you can find errors in the many "pep8" commits that I just pushed I will adjust my settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My earlier point was that it is unsustainable to try to maintain a style that is not automatically respected by autoformatters (or automatically enforced by them). If we are going to enforce style, we have to rely on the tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main tool I've used on other projects is lint, with periodic manual cleanup of lint errors. We had a discussion on the sasmodels repo about a tool to check whether a commit increased the lint count and flag those that do. Seems mostly doable, but we didn't act on it.
Regarding auto-formatters, PEP 8 was intended to be prescriptive not descriptive. Code "readability" trumps formatting consistency. When vscode breaks a line in the middle of an expression rather than breaking at a comma that hurts readability. If you would like to use a code formatter please look for one that knows how to format code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see settings lets you choose between black, autopep8 and yapf. I haven't tried any of them.
refl1d/reflectivity.py
Outdated
# kz = -kz | ||
# depth, rho, irho, sigma, rhoM, thetaM = [v[::-1] for v in (depth, rho, irho, sigma, rhoM, thetaM)] | ||
depth, rho, irho, sigma = [_dense(a, 'd') | ||
for a in (depth, rho, irho, sigma)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe indent as:
LHS = [
item for a in list]
or if item
or list
is a long expression, then
LHS = [
item
for a in list
]
I find this makes for less work when changing LHS target.
refl1d/reflectivity.py
Outdated
YD[i] = (B23*B42-B43*B22)/DETW # -- | ||
|
||
@cc.export('mag_amplitude', 'void(f8[:], f8[:], f8[:], f8[:], f8[:], c16[:], c16[:], f8, f8[:], i4[:], c16[:], c16[:], c16[:], c16[:])') | ||
@njit(parallel=True, cache=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the type signature with the njit
rather than the cc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware that this would work!
refl1d/reflectivity.py
Outdated
B2SLD = 2.31604654 # Scattering factor for B field 1e-6 | ||
|
||
from numba.pycc import CC | ||
cc = CC('magnetic_amplitude') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to do AOT compile rather than simply shipping numba and letting it use whatever optimization is available for the platform it is running on? Sounds like less work to me. Startup times will be slightly slower, but not an issue when fitting since the compile only needs to happen once. Or maybe once per process: I don't know how numba interacts with multiprocessing.
Note that numba is able to parallelize across cores and sockets on the same machine, so it may be faster to run bumps single threaded with the nllf running in parallel rather than running the optimizer in parallel with the nllf single threaded. Or maybe fewer threads than cores to allow some overlap during the non-parallel bits while reducing contention during the parallel bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The startup times were pretty long when using njit without AOT... 10+ seconds on my machine. There are probably things that can be done to speed this up. I found that for whatever reason the njit code was slightly slower than the AOT code, though that's not supposed to be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caching should take care of that. You are doing this by hand anyway, so let numba take care of it. It seems to work for me. It took 36s to run check_examples.py with njit alone and 38s to run with precompiled, in both cases having updated the cache prior to timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that it doesn't cache between starts of refl1d - it recompiles every time the program is started, which is potentially very annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That hasn't been my experience (mac conda, python=3.8, numba=0.53.1). If you touch the file it'll recompile, but otherwise it should use the cache. Any chance you are touching the .py file between runs? [obviously not, but I have to ask...]
There are some issues on this from two years ago. Perhaps they weren't resolved?
Are there any warnings showing up if you run from the command line?
The docs tell you how to set the cache directory. Can you check that the cache is being created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct - the cache seems to be working. I think I was thrown off because I was editing some other parts of the source file (not in the numba functions) and I didn't think that would affect the cache, but it does.
Sometimes shorter code can help the compiler and processor. I saved a little time (5%) by simplifying the B initialization, using the following: Z = 0.
S1LP = -sqrt(complex(PI4*(RHO[L]+RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
S3LP = -sqrt(complex(PI4*(RHO[L]-RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS)))
BLP = GLP = 0j
B11 = B22 = B33 = B44 = 1. + 0j
B12 = B13 = B14 = B21 = B23 = B24 = B31 = B32 = B34 = B41 = B42 = B43 = 0j
for I in range(0,N-1):
... Note: I didn't check that the result was correct; I only wanted to see if it would be faster. This may be worth doing to make the code easier to maintain. Fewer repeated expressions. I'll let you decide. |
Please change Or we can return a tuple instead of passing in a numpy output array: CR4XA_SIG = 'UniTuple(c16,4)(i8, f8[:], f8[:], f8, f8[:], f8[:], f8[:], c16[:], c16[:], f8)' Then use code like the following to receive it:
etc. No difference in performance. |
I seem to be in micro-optimization mode. It's probably too small to notice, but the following should be slightly faster: MAGAMP_SIG = 'c16[:,:](f8[:], f8[:], f8[:], f8[:], f8[:], c16[:], c16[:], f8[:])'
@njit(MAGAMP_SIG, parallel=True, cache=True)
def magnetic_amplitude_py(d, sigma, rho, irho, rhoM, u1, u3, KZ):
R = np.empty((len(points), 4), dtype=np.complex128)
for i in prange(points):
Cr4xa(layers, d, sigma, 1.0, rho, irho, rhoM, u1, u3, KZ[i], R[i])
# If magnetism in incident or backing media then recompute with IP=-1.0
# assigning to R[2] and R[3] only. This must come after the calculation for IP=1.0
# which assigns to all R[0], R[1], R[2], R[3].
if abs(rhoM[0]) <= MINIMAL_RHO_M and abs(rhoM[layers-1]) <= MINIMAL_RHO_M:
for i in prange(points):
Cr4xa(layers, d, sigma, -1.0, rho, irho, rhoM, u1, u3, KZ[i], R[i])
return R modifying CR4XA to only assign This avoids memory thrashing and copying associated with Y, allocating the entire return vector in one step. |
Sure! Do we have a good test for magnetic reflectivity calculations, besides zeeman_check (which is not set up to be a CI test, as far as I can tell)? |
I run the sample models before releasing but only as a smoke test; I don't look at the results. This is done using It wouldn't take too much to turn this into a CI test. Just check if the chisq_str has changed. Meanwhile, you can do this manually using |
The numba convolve gaussian seems to perform almost exactly the same as the (nearly identical) C code from which it was copied. Not very surprising. |
@pkienzle this branch now runs and passes all tests without compiling reflmodule.
The setup.py has been modified so it does not try to build binary extension. It builds a wheel with extension py3-none-any (universal python3 wheel) All the chisq tests return the same number as in the reflmodule.so branch, except for doc/examples/freemag/pmf.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify the sign on the complex part of the refractive index for the magnetic version, which at a glance looks inconsistent between the numba and the C code.
We should add the following to setup.py python_requires='>=3.8',
, or make it 3.7 and add 3.7 to the CI tests.
Rename reflmodule.py
to something that won't conflict with the old .so file for those of us running an "inplace" version.
Address the minor comments as you see fit and feel free to pull.
.github/workflows/test.yml
Outdated
- { os: ubuntu-latest, py: 3.8, doc: 1 } | ||
- { os: windows-latest, py: 3.6, whl: 1 } | ||
- { os: windows-latest, py: 3.7, whl: 1 } | ||
- { os: ubuntu-latest, py: 3.8, doc: 1, whl: 1 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 3.8 now required? Numpy is sitting at 3.7 as the oldest version, so ideally we would match that (though not important).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was using 3.8 as the target for the dataclasses branch, but I don't remember why. I don't think it's important - 3.7 is fine.
|
||
|
||
#@numba.njit(ALIGN_MAGNETIC_SIG, parallel=False, cache=True) | ||
@numba.njit(cache=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no function signature?
If it is because the inputs are not of the correct time and we are relying on conversion during call, then I would prefer to trigger an error here and force the caller to produce the correct types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some of these there was explicit polymorphism encoded in the wrapper, and it seemed like a reasonable policy in numba to address that by not specifying a signature. I did some testing and was not able to see any performance difference between signature/no signature. For this function I suspect the usage of the variables is pretty unambiguous to numba.
|
||
#@numba.njit(CONTRACT_MAG_SIG, parallel=False, cache=True) | ||
@numba.njit(cache=True) | ||
def contract_mag(d, sigma, rho, irho, rhoM, thetaM, dA): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this code is broken and the magnetic folk always run with dA=0. I've added a ticket.
@@ -0,0 +1,174 @@ | |||
import numba | |||
import numba.experimental | |||
import math |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebin is only used for the pulsed instrument simulation code. In practice this is dead code and should be removed. Maybe drop rebin.py and put a NotImplemented
error in the simulation code. No need to do so immediately. I've added a ticket.
Note: it may be worth comparing the performance of this numba code to the numpy code we have in reductus.
refl1d/rebin.py
Outdated
from . import reflmodule as _cmodule | ||
|
||
from . import reflmodule | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lint complains about spaces at the ends of lines.
# | ||
S1L = -sqrt(complex(PI4*(RHO[L]+RHOM[L]) - E0, -PI4*(fabs(IRHO[L])+EPS))) | ||
S3L = -sqrt(complex(PI4*(RHO[L]-RHOM[L]) - | ||
E0, -PI4*(fabs(IRHO[L])+EPS))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break after comma to keep expressions together.
Why does this code use -PI4...
while the code in lib/magnetic.cc
was changed in this PR to use +PI4...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent question, and I wish I had written down what I was doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was testing the sign conventions, and never reverted the .cc file because it was no longer being used. The negative signs in this expression are designed to explicitly choose the branch for the sqrt. There might be a better way to do this.
refl1d/reflectivity.py
Outdated
|
||
sld_b, u1, u3 = calculate_u1_u3(H, rhoM, thetaM, Aguide) | ||
sld_b, u1, u3 = calculate_u1_u3_py(H, rhoM, thetaM, Aguide) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not using the numba version?
Note: could move the ..._py
versions to tests/refl1d
if we only need them for testing. If we want to make numba optional I suggest moving the numpy version to the same file as the numba version and conditionally return one or the other.
Or are we sure now that numba runs without difficulty on all the platforms that we care about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an oversight - I switched to the pure python version before the numba version was ready, then never switched back.
# Trigger reflectivity calculation even if there is no data to | ||
# compare against so that we can profile simulation code, and | ||
# so that simulation smoke tests are run more thoroughly. | ||
QR = self.reflectivity() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an important change, outside the refactor with numba... I hope this is in the other branches too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super important. In the past I've forced the call directly within the model file, but this didn't work when profiling.
@@ -59,6 +59,7 @@ | |||
('polymer', 'Polymer models'), | |||
('probe', 'Instrument probe'), | |||
('profile', 'Model profile'), | |||
('refllib', 'Low level reflectivity calculations'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will pick up the docstrings from the functions defined in refllib but no module level docstrings. This may be good enough for our purposes and we don't need to figure out how to document subpackages "properly". We might drop a comment here to that effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add all the functions to an all list in refllib.py in order for them to get picked up. I agree we can figure out how to improve the docstrings in the future. When you say "module-level docstrings" I'm not sure what you mean - I added a docstring to the top of refllib.py and it seems to have picked that up (it's pretty minimal). If you have something in mind for a comment/todo to add to the file please feel free to add!
This was an interesting experiment... I copied our magnetic_amplitude code from C back into python (almost verbatim) and compiled it with numba (Ahead Of Time compile)
The resulting calculations of a real model are around 20% faster with numba than the C extension: (the first number is for the C extension, the second for the numba AOT extension)
The code is not really ready to use - the importing or compiling of the module dance is a bit awkward as written.