-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(WIP) Functionality for choosing a random move weighted by P outputs. #1732
Conversation
Add randombyp functionality
Add randombyp functionality.
Add randombyp functionality.
Add randombyp functionality.
Added a function to square the P values, however, learned that it is not necessary thanks to PST, so I will be reverting that change soon. |
Change has been reverted. Feature and code is now ready for merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I don't fully like is that by default with this option enabled, Lc0 will still think long but in the end this calculation will be not used...
So maybe consider those options:
Option 1
Make this option also disable search (and maybe rename it to something like MaiaMode
)
Option 2 (I like it more)
Change function GetBestRootChildWithTemperature
to select move by P when there are no N
in any of child nodes. That way it will kind of work "naturally", will use P
instead of N
if search is turned off. It will also make sense to use temperature
to scale P values, that way all the standard temperature curve functionality would automatially work. But even without per-move curve, it would be possible to control randomness more gradually instead of having a "best P vs randomly weighted by P" swtich.
src/mcts/params.cc
Outdated
@@ -99,6 +99,9 @@ const OptionId SearchParams::kTwoFoldDrawsId{ | |||
"Evaluates twofold repetitions in the search tree as draws. Visits to " | |||
"these positions are reverted when the first occurrence is played " | |||
"and not in the search tree anymore."}; | |||
const OptionId SearchParams::kRandombyP{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kRandombyP
-> kRandomByPId
(note the Id
suffix)
also to be consistent witht the rest of flags, probably "random-by-p" and "RandomByP"
But actually, I think the name can be improved. MoveSelectionByP
maybe? Or even MaiaMode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that! Thank you!
src/mcts/params.h
Outdated
@@ -54,6 +54,7 @@ class SearchParams { | |||
return at_root ? kCpuctFactorAtRoot : kCpuctFactor; | |||
} | |||
bool GetTwoFoldDraws() const { return kTwoFoldDraws; } | |||
float GetRandombyP() const { return options_.Get<bool>(kRandombyP); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here I'd write GetRandomByP
with capital "B"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll add that.
src/mcts/search.cc
Outdated
// In case of floating point subtraction issues above. | ||
return *root_node_->Edges(); | ||
|
||
assert(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed as there is return above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the assert. I forgot to remove it after I fixed the return.
// Get sum of weights for roll. | ||
float total_weights = 0.0; | ||
for (auto& edge : root_node_->Edges()) { | ||
total_weights += edge.GetP(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the sum of P is always 1.0, no need to sum them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? I'll have to look into that. If so, then I'll just remove this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sum of P is only not 1.0 if there are moves which lead to a forced loss, as these get their policy zeroed. However, this will only happen with some amount of search.
I'd still propose to keep this logic, and use it to introduce a parameter for policy cutoff like suggested in #1734, so absolute blunders can be suppressed.
src/mcts/search.cc
Outdated
@@ -621,6 +622,7 @@ void Search::EnsureBestMoveKnown() REQUIRES(nodes_mutex_) | |||
auto bestmove_edge = temperature | |||
? GetBestRootChildWithTemperature(temperature) | |||
: GetBestChildNoTemperature(root_node_, 0); | |||
if (randombyp) bestmove_edge = GetRandomChildbyP(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems wasteful to call standard GetBestChild..() functions and then overwrite with GetRandomChildByP.
Maybe combine to one expression?
auto bestmove_edge = params_.GetRandomByP() ? GetRandomChildByP()
: temperature ? GetBestRootWithTemp...
: GetBestNoTemp..;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a much better way of doing it. Thank you!
I definitely considered Option 1. I will look into that one. Option 2, if I'm understanding it correctly, overrides the ability to have the engine run with 1 node without randomness. Even if someone were to set PST to 0.1 (the lowest accepted value according to some documentation?), it would still introduce undesirable randomness in that situation. It's also a bit unintuitive for the engine to suddenly introduce randomness when it's set to a 1 node limit, even if that randomness could be disabled through PST. So I feel like it's better to have a switch where it's either random or not random, and the amount of randomness is controlled by PST. Unless I misunderstood, I'm going to go with Option 1 and disable search when this option is enabled. Also, I'm not sure |
In Option 2, by default it will just play the move with highest P (in fact, GetBestMoveWithTemperature won't be called, GetBestMoveNoTemperature will be called instead). But if the temperature is set to 1, it will play the move according to probabilities (and other variants are also possible, like 0.5 to be something in the middle, and +inf to play all moves with the same probability). That way it would be possible not to introduce additional parameters at all, and reuse the existing parameters controlling randomness (namely, temperature). |
Oh, I think understand what you're saying now. You're saying that if the node limit is 1, and a temperature is set, then I could set the default functionality to be choosing by P? I'm trying to wrap my head around how that would work a bit. I would still need to set the policy softmax temperature in order to get the desired probabilities (1.0 for default probabilities), and that would actually be the control for how much randomness there is, so I would be essentially ignoring temperature in the process, even though I set it to something. Or rather, setting temperature to anything other than 0.0 would be enabling the functionality assuming the depth of search is only 1 node, and the policy softmax temperature would be the randomness control. So after adding that functionality, it would look like Am I understanding this correctly? I feel like the desired functionality would be simply setting the temperature and using that as the control, but the policy softmax temperature also plays a part and would need to be changed before the evaluation begins. |
Another thing that is bothering me about using the setting of temperature instead of an explicit parameter is the idea of creating some kind of "hidden" functionality in the engine that requires knowledge of the code base unless this hidden functionality is written out in the documentation about temperature. I understand that it's more efficient to use existing parameters to activate it as far as modification to code goes, but as far as usability goes, it's extremely unintuitive to the point of almost seeming intentionally obscured. And considering that in either case, policy softmax temp requires setting anyway, I would rather go with adding a readable parameter than obscuring the functionality within an edge case of a function that serves a different purpose. Because of this, I feel that keeping and using the parameter |
The more I think about it, the more I like the idea of integrating it into It's not hidden actually, to me it looks like it should be the expected way for users.
I've just attempted to start a discussion on Discord, but noone responded so far. |
A better method has been decided and a different pull request will be created. |
This feature is intended to be used with Maia nets to allow them to play with randomized variety weighted by the move probabilities. But it can really be used with any other networks. And it could also potentially be used for a different style of training.
Enabled with
--randombyp=true
.Using it with a node depth greater than 1 is a waste of resources because it just uses the P values to determine the move.
It is recommended that you also set PST to 1.0 (instead of the default) if you want the original P values to be used. PST of less than 1.0 will favor the more likely moves, resulting in play with less obvious blunders.