Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

159 feat bms return variable number of models #161

Merged
merged 17 commits into from
Dec 15, 2022
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
46a3f51
resolve MAJOR nonbreaking bug that was preventing the parallel trees …
TheLemonPig Dec 5, 2022
375a5d7
add variable to BMSRegressor to store multiple models
TheLemonPig Dec 5, 2022
def8cce
update .gitignore with .dat files made by test runs
TheLemonPig Dec 5, 2022
d06d488
simplify by holding all trees, rather a number you have to specify be…
TheLemonPig Dec 5, 2022
842cf16
Merge branch 'main' into 159-feat-bms-return-variable-number-of-models
TheLemonPig Dec 6, 2022
348543e
test: add testcase for models_
hollandjg Dec 6, 2022
ee06f86
removing lines to resolve in another PR
TheLemonPig Dec 6, 2022
f15e15f
Merge branch '159-feat-bms-return-variable-number-of-models' into 159…
hollandjg Dec 6, 2022
aed52ab
Merge branch 'main' into 159-feat-bms-return-variable-number-of-models
TheLemonPig Dec 7, 2022
5d80834
Update tests/test_bms_multi_model_output.py
TheLemonPig Dec 7, 2022
6b4d7d3
Merge branch '159-feat-bms-return-variable-number-of-models' into 159…
hollandjg Dec 8, 2022
b3ad2d3
Merge branch 'main' into 159-feat-bms-return-variable-number-of-models
hollandjg Dec 8, 2022
4b22a87
Merge branch '159-feat-bms-return-variable-number-of-models' into 159…
TheLemonPig Dec 8, 2022
c235992
Merge pull request #165 from AutoResearch/159-feat-bms-return-variabl…
TheLemonPig Dec 8, 2022
5ccabd2
Merge branch 'main' into 159-feat-bms-return-variable-number-of-models
hollandjg Dec 15, 2022
5aa11f2
fix: update path to autora.theorist.bms.Tree
hollandjg Dec 15, 2022
387005e
test: make test results less verbose
hollandjg Dec 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -70,4 +70,4 @@ dmypy.json
site/

# Jupyter Notebook load data
.ipynb_checkpoints
.ipynb_checkpoints
2 changes: 2 additions & 0 deletions autora/skl/bms.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ def __init__(
self.X_: Optional[np.ndarray] = None
self.y_: Optional[np.ndarray] = None
self.model_: Tree = Tree()
self.models_: List[Tree] = [Tree()]
self.loss_: float = np.inf
self.cache_: List = []
self.variables: List = []
Expand Down Expand Up @@ -110,6 +111,7 @@ def fit(self, X: np.ndarray, y: np.ndarray, num_param: int = 1) -> BMSRegressor:
prior_par=self.prior_par,
)
self.model_, self.loss_, self.cache_ = utils.run(self.pms, self.epochs)
self.models_ = list(self.pms.trees.values())
TheLemonPig marked this conversation as resolved.
Show resolved Hide resolved

_logger.info("BMS fitting finished")
self.X_, self.y_ = X, y
Expand Down
13 changes: 7 additions & 6 deletions autora/theorist/bms/parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def __init__(
Ts.sort()
self.Ts = [str(T) for T in Ts]
self.trees = {
"1": Tree(
"1.0": Tree(
ops=ops,
variables=deepcopy(variables),
parameters=deepcopy(parameters),
Expand All @@ -60,7 +60,7 @@ def __init__(
BT=1,
)
}
self.t1 = self.trees["1"]
self.t1 = self.trees["1.0"]
for BT in [T for T in self.Ts if T != 1]:
treetmp = Tree(
ops=ops,
Expand All @@ -87,7 +87,7 @@ def mcmc_step(self, verbose=False, p_rr=0.05, p_long=0.45) -> None:
for T, tree in list(self.trees.items()):
# MCMC step
tree.mcmc_step(verbose=verbose, p_rr=p_rr, p_long=p_long)
self.t1 = self.trees["1"]
self.t1 = self.trees["1.0"]

# -------------------------------------------------------------------------
def tree_swap(self) -> Tuple[Optional[str], Optional[str]]:
Expand Down Expand Up @@ -119,7 +119,7 @@ def tree_swap(self) -> Tuple[Optional[str], Optional[str]]:
self.trees[self.Ts[nT2]] = t1
t1.BT = BT2
t2.BT = BT1
self.t1 = self.trees["1"]
self.t1 = self.trees["1.0"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the 1.0 do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably easier to see looking in the code but I'll do my best to describe it. The keys of the dictionary of temperatures are created as numerical values and then recast as strings. Originally the code contained one integer 1 and the rest floats. When we type casted, we had to change an integer to a float. 1 became 1.0, which is different when cast as a string and so is a distinct key value. The code assumes a one-to-one correspondence between temperature values and dictionary keys. The code also hard codes one tree with 1 and then iteratively creates trees from the temperatures, but skips 1 (which we changed to 1.0). So the parallel machine scientist held a tree for a temperature value of 1 and of 1.0, and considers the the key "1" to hold the best model and ignores "1.0". When comparing trees in tree_swap(), it only chooses among those that correspond to a temperature value stored in, which are recorded as floats, and so ignores "1". Hence the model selected by BMS is unaffected by tree_swap(). The tree corresponding to "1" does improve over time because it is still affected by mcmc_step(). However, we lose the functionality of having higher temperatures, tree swaps, and the other 95% of the training results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tl;dr: when we type set temperatures to be floats, the hooks forced us to change a 1 to a 1.0. Turns out there was some hard coding which relied on it being specifically 1.

return self.Ts[nT1], self.Ts[nT2]
else:
return None, None
Expand All @@ -140,7 +140,7 @@ def anneal(self, n=1000, factor=5) -> None:
t.BT *= factor
for kk in range(n):
print(
"# Annealing heating at %g: %d / %d" % (self.trees["1"].BT, kk, n),
"# Annealing heating at %g: %d / %d" % (self.trees["1.0"].BT, kk, n),
file=sys.stderr,
)
self.mcmc_step()
Expand All @@ -150,7 +150,8 @@ def anneal(self, n=1000, factor=5) -> None:
t.BT = float(BT)
for kk in range(2 * n):
print(
"# Annealing cooling at %g: %d / %d" % (self.trees["1"].BT, kk, 2 * n),
"# Annealing cooling at %g: %d / %d"
% (self.trees["1.0"].BT, kk, 2 * n),
file=sys.stderr,
)
self.mcmc_step()
Expand Down
25 changes: 25 additions & 0 deletions tests/test_bms_multi_model_output.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import numpy as np
import pytest

from autora.skl.bms import BMSRegressor
from autora.theorist.bms import Tree


@pytest.fixture
def curve_to_fit():
x = np.linspace(-10, 10, 100).reshape(-1, 1)
y = (x**3.0) + (2.0 * x**2.0) + (17.0 * x) - 1
return x, y


def test_bms_models(curve_to_fit):
x, y = curve_to_fit
regressor = BMSRegressor(epochs=100)

regressor.fit(x, y)

print(regressor.models_)

assert len(regressor.models_) == len(regressor.ts) # Currently hardcoded
for model in regressor.models_:
assert isinstance(model, Tree)