-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoded models #864
Encoded models #864
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments ... just discussion points for our chat at 1.30
@@ -742,3 +743,159 @@ def covariance_with_top_fidelity(self, query_points: TensorType) -> TensorType: | |||
:return: The covariance with the top fidelity for the `query_points`, of shape [N, P] | |||
""" | |||
raise NotImplementedError | |||
|
|||
|
|||
class EncodedProbabilisticModel(ProbabilisticModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not abstract (not an interface) I guess this means that none of our interfaces in automotive can extend from it. But I'm not sure if this is a problem, since I don't fully understand the intended usage at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the main question is "do automotive models (like DeepEnsembleBOModel) need to implement predict_encoded
and predict_encoded
, or is it sufficient for them to implement predict
and sample
as they do now?
In other words, is it sufficient for only the wrapped models (like Trieste's DeepEnsemble
) to implement these methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the various TriesteBOModels can stay exactly as they are — as they simply delegate predict, etc to the internal trieste model (which will now do the encoding for them).
return self.encoder(points) | ||
|
||
@abstractmethod | ||
def predict_encoded(self, query_points: TensorType) -> tuple[TensorType, TensorType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm missing some context, but I'm wondering when we need to predict on encoded query points (predict_encoded
) vs raw query points (predict
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll explain it better in our chat, but essentially external actors will always call predict
(and typically won't even know that this is an encoded model). Internally, we may need to call predict_encoded
e.g. to generate samples, as by that point encoding will already have happened.
|
||
@property | ||
@abstractmethod | ||
def encoder(self) -> EncoderFunction | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a little strange to me that encoder
and encode
are public methods. But maybe this is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoder is an abstract property to ensure that any encoded model has an actual encoder it can use.
Encode is a helper method for models that implement additional methods that need encoding (e.g. DeepEnsemble.sample_ensemble
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
One question though, what is the reason for not updating the multi-fidelity models? Is that going to be in a separate PR? I think for gpflux models we already discussed and said this would be done in a later PR.
"""Implementation of predict on encoded query points.""" | ||
|
||
@abstractmethod | ||
def sample_encoded(self, query_points: TensorType, num_samples: int) -> TensorType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: perhaps add shape checks for various *_encoded
methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure about that, as this is essentially an internal interface, so if someone wants to mess about with the batch dimensions as part of the encoding it's kinda fine as long as the external interface (which is shape checked) is still satisfied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really confused with changes in the models
package, to me it seems we are unnecessarily appending _encoded
to all of our methods - why is that needed? we don't seem to keep the old methods without it - why not simply keep the old names, we define a method maybe_encode
that encodes the data / query points if encoder is passed through as an argument?
encoder = one_hot_encoder(problem.search_space) | ||
encoded_space = one_hot_encoded_space(problem.search_space) | ||
model = GaussianProcessRegression( | ||
build_gpr(encode_dataset(initial_data, encoder), encoded_space, likelihood_variance=1e-8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps encode_dataset
call should happen in the builder itself automatically if a mixed search space is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I'll also add optional arguments in case people want to use a different encoding to one_hot.
# 6 categories mapping to 3 random points plus the 3 minimizer points | ||
points = tf.concat( | ||
[tf.random.uniform([3], dtype=tf.float64), ScaledBranin.minimizers[..., 0]], 0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get the logic behind this problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first dimension is a categorical space with 6 points, three of which are derived from the minimizers (to ensure the problem minimum corresponds to the global scaled branin minimum) and the other three randomly selected. We then shuffle these, as the order of the categories shouldn't affect how we treat them.
@@ -190,3 +199,86 @@ def test_optimizer_finds_minima_of_the_scaled_branin_function( | |||
acquisition_function = acquisition_rule._acquisition_function | |||
if isinstance(acquisition_function, AcquisitionFunctionClass): | |||
assert acquisition_function.__call__._get_tracing_count() <= 4 # type: ignore | |||
|
|||
|
|||
def categorical_scaled_branin( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify what are you doing here?
tf.constant([[0], [0]], dtype=tf.float64), | ||
tf.constant([[1], [1]], dtype=tf.float64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why testing only on 64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(changed a couple to float32, though dtypes are also tested elsewhere)
Keeping the old names is not possible for multiple reasons. One is that we need both the external (encoding) method and internal (already encoded) method as method implementations often call each other: e.g. sample might call predict and we don't want it to encode twice. Another is that automatically rewriting class methods using a mixin requires either a metaclass (which doesn't work with ABCs) or a class decorator (which confuses the IDE). A third is that we want to ensure people subclassing an encoded class with a new implementation of predict don't accidentally break encoding. Also worth noting that encoding affects a lot of methods (including some ad hoc ones only used by specific model types) and that the datasets and query points are bare tensors so have no way of remembering whether they've already been encoded. |
Related issue(s)/PRs:
Summary
Support for encoded models. Follows on from #863.
Fully backwards compatible: yes / no
PR checklist