-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting parts of Lc0's search into classes would help future development #1734
Comments
1. Selection of move to play. Current strategies:
Additional strategies:
The class(es) would need to employ
This could be done either by looping over root's child nodes, or by giving an array with all relevant information.
|
2. Selection of move to explore. This probably is the hardest of them, but also the most interesting one for future development. Current strategies:
Additional strategies:
General remarks:
The class(es) would need to employ the following functions:
|
3. Result Propagation Current strategy:
Other strategies:
The class(es) would need to employ the following functions:
|
4. Node value recalculation This doesn't yet exist, with the small exception of reverting twofold draw visits to restore balance to the tree, though there are known side effects in fast TCs (#1466, fix attempt in #1513). A node recalculation would be necessary in all strategies which can't be fully represented as incremental updates. In general, it means calculating Q/D/M of a node from its children, which effectively allows pruning of branches which turned out bad, and hence also allow exploration strategies which don't converge on mostly exploring the best moves. #963 and the mentioned However, a big caveat is that this can lead to non-incremental changes in Q, which interacts badly with PUCT, as the latter relies on incremental updates in order for the PUCT score to (semi-)reliably spread the visits across moves. This means that likely neither a recalculation loop nor a replacement for PUCT will lead to an improvement without the other, and they need to be tested together. The class(es) would need to employ the following functions:
|
There are definitely a large number of different paradigms that would be nice to explore and test. Making the code more modular in this way would definitely make it much easier to do so. More specifically, being able to experiment more easily with the search, value calculation, and the integration of the two would help massively. Currently, by the nature of MCTS, search is somewhat bound to the assumption that any move worth more exploration is also more worth playing, and vice versa, even though this isn't necessarily true, especially in chess. So the ability to experiment more easily with functions with aims akin to the aims of quiescence, pruning, minimax, and more without needing to overwrite code for existing strategies in the process is definitely something we should aim for. I very much support this proposal. |
The idea is to create a class SearchMethod which will make the process described in LeelaChessZero#1734 modular. The user can just specify 1. Strategy for the selection of the move to play, 2. Strategy for the selection of the move to explore, 3. Strategy for the result propagation, 4. Strategy for the recalculation of node value. It will allow wider experimentation with search by, for example, using NN for some of the choices made.
As of now, Lc0's search is still using AlphaZero's original PUCT, with a few modifications in WDL+draw score, some certainty propagation, and the general batching strategy. This certainly works decently well, but one of the reasons why there has been a lack of progress in the search can be attributed to how difficult it is not only to test search changes (especially if they affect the scaling behaviour), but also how difficult it is to implement them in the first place. I would like to change the latter part with this proposal.
Currently, an experiment in the search code involves
search.cc
andsearch.h
itselfnode.cc
andnode.h
if a new node output is neededparams.cc
andparams.h
search.cc
, which is rather counterintuitive e.g. in move sortingThis effectively renders casual experiments unfeasible, and on top requires a major effort in maintaining branches mergeable with
master
, which increases the difficulty of contributing to Leela's search.I have identified the following aspects of Lc0's search code which IMO would profit from being extracted similar to how
timemgr
, as experiments in the respective fields would then only require creating the associated class file, and registering the new class in the respective file, without touching any of the above files or creating merge complications.Selection of move to play.
Currently, we're using two strategies: most visits, and with temperature. There are other strategies of interest like sampling according to NN policies which are yet to be implemented.Selection of move to explore.
Currently, this uses PUCT, with the addition of drawscore, and some batching logic in multigather and collision scaling. Here, a lot of alternatives are possible (and explored in the literature).Result propagation.
Currently, this follows the general MCTS principle of averaging all outcomes, and calculating these averages in an incremental way. This would need some adaptation for the MCGS implementation at least, and in general when moving away from pure averaging.Node value recalculation.
This doesn't yet exist, but in the case of leaving averaging behind (as done in most MCTS like PUCT and MCGS), and calculating the value of a node from the value of its child nodes e.g. as RENTS or my own Beta-MCTS: a soft hybrid of MCTS and Minimax (wip, not for merge) #963 do it, this would need to be added, and multiple strategies are possible.I will create a post for each of the 4 points, collecting (a) necessary functions for such classes, (b) spell them out at least in pseudo-code (or copy from the code where available), and (c) list all parts of the codes which need replacement.
The text was updated successfully, but these errors were encountered: