Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Functionality for choosing a random move weighted by P outputs. #1732

Closed
wants to merge 15 commits into from

Conversation

danegraphics
Copy link
Contributor

@danegraphics danegraphics commented Apr 15, 2022

This feature is intended to be used with Maia nets to allow them to play with randomized variety weighted by the move probabilities. But it can really be used with any other networks. And it could also potentially be used for a different style of training.

Enabled with --randombyp=true.

Using it with a node depth greater than 1 is a waste of resources because it just uses the P values to determine the move.

It is recommended that you also set PST to 1.0 (instead of the default) if you want the original P values to be used. PST of less than 1.0 will favor the more likely moves, resulting in play with less obvious blunders.

@danegraphics danegraphics changed the title Functionality for choosing a random move weighted by probability outputs. (WIP) Functionality for choosing a random move weighted by probability outputs. Apr 16, 2022
@danegraphics
Copy link
Contributor Author

Added a function to square the P values, however, learned that it is not necessary thanks to PST, so I will be reverting that change soon.

@danegraphics
Copy link
Contributor Author

Change has been reverted. Feature and code is now ready for merge.

@danegraphics danegraphics changed the title (WIP) Functionality for choosing a random move weighted by probability outputs. Functionality for choosing a random move weighted by probability outputs. Apr 17, 2022
@danegraphics danegraphics changed the title Functionality for choosing a random move weighted by probability outputs. Functionality for choosing a random move weighted by P outputs. Apr 17, 2022
Copy link
Member

@mooskagh mooskagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't fully like is that by default with this option enabled, Lc0 will still think long but in the end this calculation will be not used...

So maybe consider those options:

Option 1
Make this option also disable search (and maybe rename it to something like MaiaMode)

Option 2 (I like it more)
Change function GetBestRootChildWithTemperature to select move by P when there are no N in any of child nodes. That way it will kind of work "naturally", will use P instead of N if search is turned off. It will also make sense to use temperature to scale P values, that way all the standard temperature curve functionality would automatially work. But even without per-move curve, it would be possible to control randomness more gradually instead of having a "best P vs randomly weighted by P" swtich.

@@ -99,6 +99,9 @@ const OptionId SearchParams::kTwoFoldDrawsId{
"Evaluates twofold repetitions in the search tree as draws. Visits to "
"these positions are reverted when the first occurrence is played "
"and not in the search tree anymore."};
const OptionId SearchParams::kRandombyP{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kRandombyP -> kRandomByPId (note the Id suffix)
also to be consistent witht the rest of flags, probably "random-by-p" and "RandomByP"

But actually, I think the name can be improved. MoveSelectionByP maybe? Or even MaiaMode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that! Thank you!

@@ -54,6 +54,7 @@ class SearchParams {
return at_root ? kCpuctFactorAtRoot : kCpuctFactor;
}
bool GetTwoFoldDraws() const { return kTwoFoldDraws; }
float GetRandombyP() const { return options_.Get<bool>(kRandombyP); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here I'd write GetRandomByP with capital "B"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add that.

// In case of floating point subtraction issues above.
return *root_node_->Edges();

assert(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed as there is return above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the assert. I forgot to remove it after I fixed the return.

// Get sum of weights for roll.
float total_weights = 0.0;
for (auto& edge : root_node_->Edges()) {
total_weights += edge.GetP();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the sum of P is always 1.0, no need to sum them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? I'll have to look into that. If so, then I'll just remove this section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sum of P is only not 1.0 if there are moves which lead to a forced loss, as these get their policy zeroed. However, this will only happen with some amount of search.

I'd still propose to keep this logic, and use it to introduce a parameter for policy cutoff like suggested in #1734, so absolute blunders can be suppressed.

@@ -621,6 +622,7 @@ void Search::EnsureBestMoveKnown() REQUIRES(nodes_mutex_)
auto bestmove_edge = temperature
? GetBestRootChildWithTemperature(temperature)
: GetBestChildNoTemperature(root_node_, 0);
if (randombyp) bestmove_edge = GetRandomChildbyP();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wasteful to call standard GetBestChild..() functions and then overwrite with GetRandomChildByP.
Maybe combine to one expression?

auto bestmove_edge = params_.GetRandomByP() ? GetRandomChildByP()
                     : temperature ? GetBestRootWithTemp... 
                                   : GetBestNoTemp..;

Copy link
Contributor Author

@danegraphics danegraphics Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a much better way of doing it. Thank you!

@danegraphics
Copy link
Contributor Author

danegraphics commented Apr 22, 2022

I definitely considered Option 1. I will look into that one.

Option 2, if I'm understanding it correctly, overrides the ability to have the engine run with 1 node without randomness. Even if someone were to set PST to 0.1 (the lowest accepted value according to some documentation?), it would still introduce undesirable randomness in that situation.

It's also a bit unintuitive for the engine to suddenly introduce randomness when it's set to a 1 node limit, even if that randomness could be disabled through PST.

So I feel like it's better to have a switch where it's either random or not random, and the amount of randomness is controlled by PST.

Unless I misunderstood, I'm going to go with Option 1 and disable search when this option is enabled.

Also, I'm not sure Maia Mode is a good name for it considering that this mode can be used for all kinds of other nets or training, also it isn't part of Maia's originally intended functionality, so implying that it should be used when a Maia net is being used feels odd. MoveSelectionByP also feels wrong because it implies that it will always select the best move according to policy. I'll think about the name a bit more, but I feel it should be general and imply that it is making a random selection based on policy output.

@danegraphics danegraphics changed the title Functionality for choosing a random move weighted by P outputs. (WIP) Functionality for choosing a random move weighted by P outputs. Apr 22, 2022
@mooskagh
Copy link
Member

In Option 2, by default it will just play the move with highest P (in fact, GetBestMoveWithTemperature won't be called, GetBestMoveNoTemperature will be called instead).

But if the temperature is set to 1, it will play the move according to probabilities (and other variants are also possible, like 0.5 to be something in the middle, and +inf to play all moves with the same probability).

That way it would be possible not to introduce additional parameters at all, and reuse the existing parameters controlling randomness (namely, temperature).

@danegraphics
Copy link
Contributor Author

danegraphics commented Apr 22, 2022

Oh, I think understand what you're saying now.

You're saying that if the node limit is 1, and a temperature is set, then I could set the default functionality to be choosing by P?

I'm trying to wrap my head around how that would work a bit.

I would still need to set the policy softmax temperature in order to get the desired probabilities (1.0 for default probabilities), and that would actually be the control for how much randomness there is, so I would be essentially ignoring temperature in the process, even though I set it to something.

Or rather, setting temperature to anything other than 0.0 would be enabling the functionality assuming the depth of search is only 1 node, and the policy softmax temperature would be the randomness control.

So after adding that functionality, it would look like ./lc0 --temperature=1.0 --policy-softmax-temp=1.0 and then go nodes 1 and it would work?

Am I understanding this correctly?

I feel like the desired functionality would be simply setting the temperature and using that as the control, but the policy softmax temperature also plays a part and would need to be changed before the evaluation begins.

@danegraphics
Copy link
Contributor Author

danegraphics commented Apr 22, 2022

Another thing that is bothering me about using the setting of temperature instead of an explicit parameter is the idea of creating some kind of "hidden" functionality in the engine that requires knowledge of the code base unless this hidden functionality is written out in the documentation about temperature.

I understand that it's more efficient to use existing parameters to activate it as far as modification to code goes, but as far as usability goes, it's extremely unintuitive to the point of almost seeming intentionally obscured.

And considering that in either case, policy softmax temp requires setting anyway, I would rather go with adding a readable parameter than obscuring the functionality within an edge case of a function that serves a different purpose.

Because of this, I feel that keeping and using the parameter RandomByP is a better option.

@mooskagh
Copy link
Member

The more I think about it, the more I like the idea of integrating it into GetBestRootChildWithTemperature :), just to do a temperature-based pick based on P instead of N if all N=0.

It's not hidden actually, to me it looks like it should be the expected way for users.
Users know that when they need randomness, they tune temperature parameters, and there should be no difference whether there is a search or not (or e.g. if search is enabled but didn't happen due to low time budget).
So having separate ways to enable randomness in different modes looks confusing.

--policy-softmax-temp is an irrelevant flag, it affects MCTS visits rather than move selection.

I've just attempted to start a discussion on Discord, but noone responded so far.

@danegraphics
Copy link
Contributor Author

A better method has been decided and a different pull request will be created.

@danegraphics danegraphics deleted the randombyp branch April 23, 2022 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants