Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

159 feat bms return variable number of models #161

Merged
merged 17 commits into from
Dec 15, 2022

Conversation

TheLemonPig
Copy link
Collaborator

Description

Simple attribute added to BMSRegressor which contains all parallel trees. This is for a new experimentalist which compares fit between models. A major nonbreaking bug was spotted and resolved in the process (which will allow BMS to work better with more difficult data).

_Fixes #159 _

Type of change:

  • Bug fix
  • New feature (non-breaking change which adds functionality)

Features:

  • Regressor attribute containing multiple models which best fit the data at different temperatures

Questions:

Remarks:

  • I can't believe the bug wasn't spotted. We weren't using the parallel trees at all. The issue is essentially related to the type inconsistency in mcmc.py - during type setting, we changed a '1' to a '1.0', which caused major issues when changing floats to strings for a dictionary
  • BMS will work much better now! Especially for recovering difficult equations

@TheLemonPig TheLemonPig requested a review from musslick as a code owner December 5, 2022 21:45
@TheLemonPig TheLemonPig linked an issue Dec 5, 2022 that may be closed by this pull request
@musslick musslick requested a review from hollandjg December 5, 2022 22:25
@benwandrew
Copy link
Collaborator

@TheLemonPig good catch! should we be more generally concerned about the convoluted type changes in mcmc.py now?

@TheLemonPig
Copy link
Collaborator Author

@benwandrew I am hoping not. Most of it definitely not. But hard to say for sure

Copy link
Member

@hollandjg hollandjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!
I've one major suggestion: I think a test case for the new functionality would are really useful.
I've made a start in #165, but I think it would be good to show how you expect people to use this functionality.

.gitignore Outdated Show resolved Hide resolved
@@ -119,7 +119,7 @@ def tree_swap(self) -> Tuple[Optional[str], Optional[str]]:
self.trees[self.Ts[nT2]] = t1
t1.BT = BT2
t2.BT = BT1
self.t1 = self.trees["1"]
self.t1 = self.trees["1.0"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the 1.0 do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably easier to see looking in the code but I'll do my best to describe it. The keys of the dictionary of temperatures are created as numerical values and then recast as strings. Originally the code contained one integer 1 and the rest floats. When we type casted, we had to change an integer to a float. 1 became 1.0, which is different when cast as a string and so is a distinct key value. The code assumes a one-to-one correspondence between temperature values and dictionary keys. The code also hard codes one tree with 1 and then iteratively creates trees from the temperatures, but skips 1 (which we changed to 1.0). So the parallel machine scientist held a tree for a temperature value of 1 and of 1.0, and considers the the key "1" to hold the best model and ignores "1.0". When comparing trees in tree_swap(), it only chooses among those that correspond to a temperature value stored in, which are recorded as floats, and so ignores "1". Hence the model selected by BMS is unaffected by tree_swap(). The tree corresponding to "1" does improve over time because it is still affected by mcmc_step(). However, we lose the functionality of having higher temperatures, tree swaps, and the other 95% of the training results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tl;dr: when we type set temperatures to be floats, the hooks forced us to change a 1 to a 1.0. Turns out there was some hard coding which relied on it being specifically 1.

Copy link
Collaborator

@musslick musslick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

autora/skl/bms.py Show resolved Hide resolved
@TheLemonPig
Copy link
Collaborator Author

This is great! I've one major suggestion: I think a test case for the new functionality would are really useful. I've made a start in #165, but I think it would be good to show how you expect people to use this functionality.

The attribute was added for the dissimilarity experimentalist. Currently, there is not code elsewhere in the BMS to make use of this attribute. The print statement in #165 is the only use of it that I can think of. We could add an argument to present results to present the results of all the models perhaps.

hollandjg and others added 4 commits December 8, 2022 08:51
@TheLemonPig TheLemonPig requested a review from hollandjg December 8, 2022 15:23
@TheLemonPig
Copy link
Collaborator Author

This is great! I've one major suggestion: I think a test case for the new functionality would are really useful. I've made a start in #165, but I think it would be good to show how you expect people to use this functionality.

The attribute was added for the dissimilarity experimentalist. Currently, there is not code elsewhere in the BMS to make use of this attribute. The print statement in #165 is the only use of it that I can think of. We could add an argument to present results to present the results of all the models perhaps.

I have just created an issue to further address this: AutoResearch/autora-theorist-bms#31

Copy link
Member

@hollandjg hollandjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something broken here when running under python 3.9 – the test cases (test_tree_mcmc_stepping) suddenly takes ~30 minutest to run. Doesn't seem to affect python 3.8 or 3.10. We should fix that before we merge, because this will cause us problems with the tests of all our future code if we don't.

@TheLemonPig TheLemonPig merged commit 8ac65a6 into main Dec 15, 2022
@TheLemonPig TheLemonPig deleted the 159-feat-bms-return-variable-number-of-models branch December 15, 2022 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: BMS return variable number of models
4 participants