Improving Consistency Across Optimization Loops #2110
Replies: 1 comment 3 replies
-
Thanks for this thorough study, this is quite interesting. Since this is a relatively low dimensional search space my gut feeling is that acquisition function optimization is somewhat unlikely to be the main contributor here, so I think you're right that the next step here would be to look into how much variability there is in the model fitting. One thing to note is that you're using cc @esantorella and @saitcakmak who have looked into reproducibility of the optimization before. |
Beta Was this translation helpful? Give feedback.
-
In playing around with BoTorch, I've found a concerning amount of variability between optimization runs from the same initial set of data points. The results of my test problem would suggest that the random seed has substantial influence over success of the optimization run. I'm still quite new to this, and it is likely that there is an implementation error on my end, but I wanted to reach out to the community and see if there is something I am missing or some way that I can improve. I've included as much information as possible below to hopefully expose an error in my methods.
TLDR: When variation between optimization loops is limited to random seed selection, is variation in optimizer performance then a function of poor surrogate model fitting, poor acqf optimization, or a little of both?
Starting with a simple Hartmann6 optimization
I've implemented a simple ensemble optimization loop in the code section below:
Running this script, I get the following results, which show a difference between the maximum and minimum optimized value over ten trials of 0.657 or ~20% of the range of the objective function.
I would expect that increasing the
num_restarts
andraw_samples
would improve this, but the effect seems marginal (0.657 vs. 0.624) as shown in the figure below. I will note that in playing around I have found some combinations to perform better than others at times, but I have been unable to reproduce these results for this post.Categorical Hartmann Problem Accentuates Discrepancies
The discrepancy between repeat trials seems to scale with problem complexity as well. Below I have built a categorical Hartmann3/6 problem wherein each category value encodes an offset Hartmann3/6 objective function. Running with a mixedGP and acqf_optimizer shows a disparity in performance between optimization loops.
The results show a large spread in "optimized" values despite having similar initial conditions. Loop 4 is a clear outlier, but beyond this point the results suggest quite a bit of uncertainty in the performance of the optimization loop.
Visualizing these results further highlights the discrepancy:
Given that all optimization loops start with the same data and only vary by the random seed given to the optimizer, where does the variation come from and how can it be controlled or reduced? Part of me suspects that this is due to variations in surrogate model fitting within
fit_gpytorch_mll
, but I haven't explored this thoroughly yet. Looking specifically at sample point selections, it seems that deviation starts almost immediately for most variables. Below I have plotted the selected variable values across each trial and and optimization loop. The top-most plot is the categorical variable selection, the continuous variables follow.Hopefully, I've communicated this effectively. Happy to clarify where necessary!
Beta Was this translation helpful? Give feedback.
All reactions