(WIP) Functionality for choosing a random move weighted by P outputs. #1732

danegraphics · 2022-04-15T20:37:51Z

This feature is intended to be used with Maia nets to allow them to play with randomized variety weighted by the move probabilities. But it can really be used with any other networks. And it could also potentially be used for a different style of training.

Enabled with --randombyp=true.

Using it with a node depth greater than 1 is a waste of resources because it just uses the P values to determine the move.

It is recommended that you also set PST to 1.0 (instead of the default) if you want the original P values to be used. PST of less than 1.0 will favor the more likely moves, resulting in play with less obvious blunders.

Add randombyp functionality

Add randombyp functionality.

danegraphics · 2022-04-16T07:41:20Z

Added a function to square the P values, however, learned that it is not necessary thanks to PST, so I will be reverting that change soon.

danegraphics · 2022-04-17T06:17:46Z

Change has been reverted. Feature and code is now ready for merge.

mooskagh

What I don't fully like is that by default with this option enabled, Lc0 will still think long but in the end this calculation will be not used...

So maybe consider those options:

Option 1
Make this option also disable search (and maybe rename it to something like MaiaMode)

Option 2 (I like it more)
Change function GetBestRootChildWithTemperature to select move by P when there are no N in any of child nodes. That way it will kind of work "naturally", will use P instead of N if search is turned off. It will also make sense to use temperature to scale P values, that way all the standard temperature curve functionality would automatially work. But even without per-move curve, it would be possible to control randomness more gradually instead of having a "best P vs randomly weighted by P" swtich.

mooskagh · 2022-04-22T09:40:14Z

src/mcts/params.cc

@@ -99,6 +99,9 @@ const OptionId SearchParams::kTwoFoldDrawsId{
    "Evaluates twofold repetitions in the search tree as draws. Visits to "
    "these positions are reverted when the first occurrence is played "
    "and not in the search tree anymore."};
+const OptionId SearchParams::kRandombyP{


kRandombyP -> kRandomByPId (note the Id suffix)
also to be consistent witht the rest of flags, probably "random-by-p" and "RandomByP"

But actually, I think the name can be improved. MoveSelectionByP maybe? Or even MaiaMode.

Oh, I missed that! Thank you!

mooskagh · 2022-04-22T09:41:07Z

src/mcts/params.h

@@ -54,6 +54,7 @@ class SearchParams {
    return at_root ? kCpuctFactorAtRoot : kCpuctFactor;
  }
  bool GetTwoFoldDraws() const { return kTwoFoldDraws; }
+  float GetRandombyP() const { return options_.Get<bool>(kRandombyP); }


Also here I'd write GetRandomByP with capital "B"

Yes, I'll add that.

mooskagh · 2022-04-22T09:44:31Z

src/mcts/search.cc

+  // In case of floating point subtraction issues above.
+  return *root_node_->Edges();
+
+  assert(false);


This is not needed as there is return above.

I'll remove the assert. I forgot to remove it after I fixed the return.

mooskagh · 2022-04-22T09:45:20Z

src/mcts/search.cc

+  // Get sum of weights for roll.
+  float total_weights = 0.0;
+  for (auto& edge : root_node_->Edges()) {
+    total_weights += edge.GetP();


I think the sum of P is always 1.0, no need to sum them.

Is it? I'll have to look into that. If so, then I'll just remove this section.

The sum of P is only not 1.0 if there are moves which lead to a forced loss, as these get their policy zeroed. However, this will only happen with some amount of search.

I'd still propose to keep this logic, and use it to introduce a parameter for policy cutoff like suggested in #1734, so absolute blunders can be suppressed.

mooskagh · 2022-04-22T09:48:07Z

src/mcts/search.cc

@@ -621,6 +622,7 @@ void Search::EnsureBestMoveKnown() REQUIRES(nodes_mutex_)
  auto bestmove_edge = temperature
                           ? GetBestRootChildWithTemperature(temperature)
                           : GetBestChildNoTemperature(root_node_, 0);
+  if (randombyp) bestmove_edge = GetRandomChildbyP();


It seems wasteful to call standard GetBestChild..() functions and then overwrite with GetRandomChildByP.
Maybe combine to one expression?

auto bestmove_edge = params_.GetRandomByP() ? GetRandomChildByP() : temperature ? GetBestRootWithTemp... : GetBestNoTemp..;

That's a much better way of doing it. Thank you!

danegraphics · 2022-04-22T13:00:00Z

I definitely considered Option 1. I will look into that one.

Option 2, if I'm understanding it correctly, overrides the ability to have the engine run with 1 node without randomness. Even if someone were to set PST to 0.1 (the lowest accepted value according to some documentation?), it would still introduce undesirable randomness in that situation.

It's also a bit unintuitive for the engine to suddenly introduce randomness when it's set to a 1 node limit, even if that randomness could be disabled through PST.

So I feel like it's better to have a switch where it's either random or not random, and the amount of randomness is controlled by PST.

Unless I misunderstood, I'm going to go with Option 1 and disable search when this option is enabled.

Also, I'm not sure Maia Mode is a good name for it considering that this mode can be used for all kinds of other nets or training, also it isn't part of Maia's originally intended functionality, so implying that it should be used when a Maia net is being used feels odd. MoveSelectionByP also feels wrong because it implies that it will always select the best move according to policy. I'll think about the name a bit more, but I feel it should be general and imply that it is making a random selection based on policy output.

mooskagh · 2022-04-22T13:34:13Z

In Option 2, by default it will just play the move with highest P (in fact, GetBestMoveWithTemperature won't be called, GetBestMoveNoTemperature will be called instead).

But if the temperature is set to 1, it will play the move according to probabilities (and other variants are also possible, like 0.5 to be something in the middle, and +inf to play all moves with the same probability).

That way it would be possible not to introduce additional parameters at all, and reuse the existing parameters controlling randomness (namely, temperature).

danegraphics · 2022-04-22T14:08:31Z

Oh, I think understand what you're saying now.

You're saying that if the node limit is 1, and a temperature is set, then I could set the default functionality to be choosing by P?

I'm trying to wrap my head around how that would work a bit.

I would still need to set the policy softmax temperature in order to get the desired probabilities (1.0 for default probabilities), and that would actually be the control for how much randomness there is, so I would be essentially ignoring temperature in the process, even though I set it to something.

Or rather, setting temperature to anything other than 0.0 would be enabling the functionality assuming the depth of search is only 1 node, and the policy softmax temperature would be the randomness control.

So after adding that functionality, it would look like ./lc0 --temperature=1.0 --policy-softmax-temp=1.0 and then go nodes 1 and it would work?

Am I understanding this correctly?

I feel like the desired functionality would be simply setting the temperature and using that as the control, but the policy softmax temperature also plays a part and would need to be changed before the evaluation begins.

danegraphics · 2022-04-22T16:03:20Z

Another thing that is bothering me about using the setting of temperature instead of an explicit parameter is the idea of creating some kind of "hidden" functionality in the engine that requires knowledge of the code base unless this hidden functionality is written out in the documentation about temperature.

I understand that it's more efficient to use existing parameters to activate it as far as modification to code goes, but as far as usability goes, it's extremely unintuitive to the point of almost seeming intentionally obscured.

And considering that in either case, policy softmax temp requires setting anyway, I would rather go with adding a readable parameter than obscuring the functionality within an edge case of a function that serves a different purpose.

Because of this, I feel that keeping and using the parameter RandomByP is a better option.

mooskagh · 2022-04-22T17:30:48Z

The more I think about it, the more I like the idea of integrating it into GetBestRootChildWithTemperature :), just to do a temperature-based pick based on P instead of N if all N=0.

It's not hidden actually, to me it looks like it should be the expected way for users.
Users know that when they need randomness, they tune temperature parameters, and there should be no difference whether there is a search or not (or e.g. if search is enabled but didn't happen due to low time budget).
So having separate ways to enable randomness in different modes looks confusing.

--policy-softmax-temp is an irrelevant flag, it affects MCTS visits rather than move selection.

I've just attempted to start a discussion on Discord, but noone responded so far.

danegraphics · 2022-04-22T19:00:38Z

A better method has been decided and a different pull request will be created.

danegraphics added 13 commits April 15, 2022 14:06

Update search.cc

387ba19

Add randombyp functionality

Update search.h

d86db19

Add randombyp functionality.

Update params.h

5a5e355

Add randombyp functionality.

Update params.cc

63f0034

Add randombyp functionality.

Add randombyp functionality.

1d29e74

Add randombyp functionality.

6e3f90b

Add randombyp functionality.

08d1a6f

Add randombyp functionality.

592afa9

Update comment formatting on GetRandomChildbyP.

5948a03

Correct formatting.

a685fe2

Fix for floating point arithmetic.

6400ac1

Fixed fix for floating point issues.

824d826

removed unecessary new lines

95a65fb

danegraphics changed the title ~~Functionality for choosing a random move weighted by probability outputs.~~ (WIP) Functionality for choosing a random move weighted by probability outputs. Apr 16, 2022

danegraphics force-pushed the randombyp branch from e0696cb to 95a65fb Compare April 17, 2022 06:16

danegraphics changed the title ~~(WIP) Functionality for choosing a random move weighted by probability outputs.~~ Functionality for choosing a random move weighted by probability outputs. Apr 17, 2022

danegraphics changed the title ~~Functionality for choosing a random move weighted by probability outputs.~~ Functionality for choosing a random move weighted by P outputs. Apr 17, 2022

mooskagh reviewed Apr 22, 2022

View reviewed changes

danegraphics changed the title ~~Functionality for choosing a random move weighted by P outputs.~~ (WIP) Functionality for choosing a random move weighted by P outputs. Apr 22, 2022

danegraphics added 2 commits April 22, 2022 09:19

Variable name and logic improvements.

0c6f5a0

Variable name correction.

ce8961f

Naphthalin mentioned this pull request Apr 22, 2022

Extracting parts of Lc0's search into classes would help future development #1734

Open

danegraphics closed this Apr 22, 2022

danegraphics deleted the randombyp branch April 23, 2022 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) Functionality for choosing a random move weighted by P outputs. #1732

(WIP) Functionality for choosing a random move weighted by P outputs. #1732

danegraphics commented Apr 15, 2022 •

edited

Loading

danegraphics commented Apr 16, 2022

danegraphics commented Apr 17, 2022

mooskagh left a comment

mooskagh Apr 22, 2022

danegraphics Apr 22, 2022

mooskagh Apr 22, 2022

danegraphics Apr 22, 2022

mooskagh Apr 22, 2022

danegraphics Apr 22, 2022

mooskagh Apr 22, 2022

danegraphics Apr 22, 2022

Naphthalin Apr 22, 2022

mooskagh Apr 22, 2022

danegraphics Apr 22, 2022 •

edited

Loading

danegraphics commented Apr 22, 2022 •

edited

Loading

mooskagh commented Apr 22, 2022

danegraphics commented Apr 22, 2022 •

edited

Loading

danegraphics commented Apr 22, 2022 •

edited

Loading

mooskagh commented Apr 22, 2022

danegraphics commented Apr 22, 2022

(WIP) Functionality for choosing a random move weighted by P outputs. #1732

(WIP) Functionality for choosing a random move weighted by P outputs. #1732

Conversation

danegraphics commented Apr 15, 2022 • edited Loading

danegraphics commented Apr 16, 2022

danegraphics commented Apr 17, 2022

mooskagh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danegraphics Apr 22, 2022 • edited Loading

Choose a reason for hiding this comment

danegraphics commented Apr 22, 2022 • edited Loading

mooskagh commented Apr 22, 2022

danegraphics commented Apr 22, 2022 • edited Loading

danegraphics commented Apr 22, 2022 • edited Loading

mooskagh commented Apr 22, 2022

danegraphics commented Apr 22, 2022

danegraphics commented Apr 15, 2022 •

edited

Loading

danegraphics Apr 22, 2022 •

edited

Loading

danegraphics commented Apr 22, 2022 •

edited

Loading

danegraphics commented Apr 22, 2022 •

edited

Loading

danegraphics commented Apr 22, 2022 •

edited

Loading