-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when the number of points changes on the latest version #44
Comments
Do you use vectorized=True? If so, can you double check that your transform function returns a 2d array when called with one live point, i.e.,
It seems recv_samplesv in this line https://github.com/JohannesBuchner/UltraNest/blob/master/ultranest/integrator.py#L1356 has inconsistent dimensionality (can you print it?). |
class UltraNestFitter():
...
def rescale(self, cube):
params = np.empty_like(cube)
for i, parameter in enumerate(self.parameters):
params[i] = cube[i] * (parameter.max - parameter.min) + parameter.min
return params It is a method which is passed as def my_prior_transform(cube):
params = cube.copy()
lo = 400
hi = 800
params[0] = cube[0] * (hi - lo) + lo From your examples. This is a corresponding test. Transform works for both 1d and 2d arrays. Should it return 2d arrays when it is called from one point? I missed that. def test_transform_function():
parameters = [
Parameter("a", min=10, max=20),
Parameter("b", min=-5, max=15)
]
fitter = UltraNestFitter(
lnprob=lambda params: -((params[0] - 15) ** 2 + params[1] ** 2),
parameters=parameters,
log_dir="transform_test"
)
assert pytest.approx(fitter.transform(np.array([0.5, 0.2]))) == [15, -1]
assert pytest.approx(fitter.transform(np.array([0.4, 0.8]))) == [14, 11]
assert pytest.approx(fitter.transform(np.array([[0.5, 0.4], [0.2, 0.8]]))) == np.array([[15, 14], [-1, 11]])
assert pytest.approx(fitter.transform([0.5, 0.2])) == [15, -1]
assert fitter.transform(np.random.uniform(size=(2, 1))).shape == (2, 1)
assert fitter.transform(np.random.uniform(size=(1, 2))).shape == (1, 2) In this test, the last line will fail with
|
No, that's fine, the function is vectorized (with a loop) here: https://github.com/JohannesBuchner/UltraNest/blob/master/ultranest/integrator.py#L459 That's a really odd bug. can you print recv_samplesv, num_live_points_todo, and other variables? |
I tried to reproduce with:
which goes into this part of the code:
I was not able to reproduce the crash however. |
The new run crashed immediately. I added to integrator.py around line 1350 if self.use_mpi:
recv_samples = self.comm.gather(active_u, root=0)
recv_samplesv = self.comm.gather(active_v, root=0)
recv_likes = self.comm.gather(active_logl, root=0)
recv_samples = self.comm.bcast(recv_samples, root=0)
recv_samplesv = self.comm.bcast(recv_samplesv, root=0)
recv_likes = self.comm.bcast(recv_likes, root=0)
with open(f"log_{self.mpi_rank}.txt", 'a') as fff:
print(f"{self.mpi_rank}: TEST #44:\nrecv_samples=%s\nrecv_samplesv=%s\nu=%s\nv=%s\n\n" % (recv_samples, recv_samplesv, active_u, active_v), file=fff)
print(num_live_points_todo, num_live_points_missing, "\n\n\n\n\n\n\n", file=fff)
active_u = np.concatenate(recv_samples, axis=0)
active_v = np.concatenate(recv_samplesv, axis=0)
active_logl = np.concatenate(recv_likes, axis=0) I attach logs from rank0, rank1, and rank1000 > min_num_living_points |
When I had just 2 nodes and 80 cores, this worked fine. Seems that the issue happens when the core has nothing to do and broadcasts an empty array..? I am new to mpi and still suffer reading the mpi code 😕 |
OK, I see what the problem is. Some nodes have one or more points and call for them, and they produce 2d arrays for active_v, as expected Some nodes receive no work. They still call transform(), which is a vectorized wrapper call of the user-defined tranform function. It is defined here: Line 127 in 6461f10
That function does a loop over the proposed points, calls the user transform, and returns the list of results, transformed into a numpy array. The numpy array is 2d if there is one point or more, but there is a problem when this function is called with an empty list of points. Then np.asarray cannot figure out the dimensionality it is supposed to produce, so it makes |
Here are two solutions I can think of: a) we make vectorize() aware what array shape should be returned. That would be b) we change the code here before concatenate to strip away empty arrays. That is, This scenario of being short of points is not occuring in other places of the code base. I'm slightly leaning towards b). |
PS I forgot numpy is not pythonic =/
|
IIRC numpy will complain that the truth of an array with multiple values is undefined. |
Actually, I think a better patch would be:
I imagine there could be transforms and loglikelihoods that involve some setup cost, and the above avoids calling them. While the list comprehension would be pythonic and nice, doing such loops in python can be slow if there are thousands of arrays involved. The above makes my hack on the concatenate input obsolete. |
Will you commit this yourself? I agree this is better. |
I still cannot reproduce the bug, so if you can make a (new) PR and test that it solves the problem, that would be better. |
closed via #45 |
Operating System: SUSE Linux Enterprise Server 12 SP5
CPE OS Name: cpe:/o:suse:sles:12:sp5
Kernel: Linux 4.12.14-122.80-default
Architecture: x86-64
Description
With the recent version 3.3.2, I am getting an error when the number of living points changes.
What I Did
(I am not sure I recreated them in the correct order)
Here is the log as I ran it first for half an hour, then the process was killed by the slurm, and then I resumed. It started fine, but then crashed when the number of living points changed, and crashed again the same way when I restarted.
Same happened with 0190fcc version (after #43 was merged)
I will try on 3.3.0 as well, but I will have to limit the number of nodes to avoid #42 bug, so it will take longer...
The text was updated successfully, but these errors were encountered: