Move move filter population to a constructor. #1281

mooskagh · 2020-05-09T10:21:18Z

Tilps · 2020-05-09T11:08:05Z

src/mcts/search.cc

-        if (!root_move_filter_.empty() &&
-            std::find(root_move_filter_.begin(), root_move_filter_.end(),
-                      child.GetMove()) == root_move_filter_.end()) {
+        if (!search_->root_move_filter_.empty() &&


Do we trust the optimizer to pull search_->root_move_filter_ outside of the loop to avoid the extra deref or should we do that manually?

I couldn't make much sense from disassembly, but it seemed that there was an indirection inside the loop, so I moved it out.
But nowadays indirection itself is no longer considered a performance problem, usually it takes a fraction of a CPU cycle (it's indirection to cold memory that is a problem).

Tilps · 2020-05-09T11:08:44Z

Can we get a random backend check with TB disabled to see what effect this has if any in that case?

Tilps

Approved assuming nps change in non-TB scenario is in the noise.

mooskagh · 2020-05-09T12:47:53Z

So far this PR gives -10% nps drop.
But master also gives -5% to what it was when computer was fresh in the morning.

Looking further.

Tilps · 2020-05-09T13:31:23Z

I struggle to see how this could be slower, but maybe the GetBestChild path also similarly needs to avoid the derefs in the loops?

Naphthalin · 2020-05-09T17:04:06Z

(ref) Lc0_c4c92f6 with --backend=random:

Total time (ms) : 340111
Nodes searched  : 42331240
Nodes/second    : 124463

(#1281) Lc0_a4bbcf6 with --backend=random:

Total time (ms) : 340126
Nodes searched  : 42238393
Nodes/second    : 124184

(Win10, i7-4770)

mooskagh · 2020-05-09T18:02:08Z

This is a fix for a bug causing major slowdown when playing in tablebase position.

Due to this chain of calls:
DoBackupUpdate() ->
DoBackupUpdateSingleNode() ->
GetBestChild() (to update best_edge to know whether to output UCI) ->
PopulateRootMoveLimit() ->
syzygy_tb->root_probe()

We probbed the tablebase (root node, both WDL and DTZ) per every visit.
While in fact only needed to do that once per search (so moved that to a Search constructor).

Tilps · 2020-05-10T00:28:10Z

For posterity, we didn't probe tablebase every visit, only on visits that potentially change the 'best child' of root. This is why the problem didn't show up in every TB position.

Tilps · 2020-05-10T06:40:45Z

Except the check was wrong (visits to the current best edge shouldn't need to check if another edge is now best - but it checked even so), so now I am not clear why the problem didn't show up on every TB position :P

* Time management refactoring (LeelaChessZero#1195) * Appended files. * Compiles. * Compiles again. * Make smart pruning use smoothed nps. * Seems to be fully implemented. * Mistype. * One more bug. * Found discrepancy with documentaiton. * Bugfixes. * Don't smooth nps during the first move. * Too large default for timeuse decay. * Bugfix. * Fix build. * Relax defaults a bit. Add fixed to logging. * Remove "smooth" to "smooth-experimental" for now. * MLH verbose stats - Issue 1200 (LeelaChessZero#1230) * Add M effect logic to output section * Fix missing prefixes and semicolons * Some fixes. * Slight format improvement? Co-authored-by: Tilps <[email protected]> * Start TempDecay only after a given number of moves (LeelaChessZero#1212) * Added TempDecayStartMove for starting temp decay only after a given number of moves. This allows keeping initial game up for a few moves and still use decay. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Hide temp options * renamed TempDecayStartMove to TempDecayDelayMoves Co-authored-by: Alexis Olson <[email protected]> * Changelog for 0.25.0-rc2. (LeelaChessZero#1233) * Changelog for 0.25.0-rc2. * Add one more PR to the changelog. * Cuda winograd (LeelaChessZero#1228) * custom winograd convolution for cuda backends * custom winograd fixes - fix a bug to make it work for non-SE networks - enable by default only with fp32. * address review comments * remove random line in comment * remove unused constants - W,H are hardcoded to 8 - because there are assumptions in the code based on that. No point in defining constants. * cuda winograd fixes (LeelaChessZero#1238) * cuda winograd fixes - don't typecast directly to half datatype in CPU side code as older CUDA runtime doesn't support that. - don't use gemmEx version on GPUs older than Maxwell generation (not supported). - modify the check to enable custom_winograd setting. It should be faster in most cases - except presently on RTX GPUs when using fp16. * Allow most parts of fen to be optional. (LeelaChessZero#1234) Default to white to move, no castling, no en passant, 0 rule50ply, 1 total move. Also convert other string to std::string and removing using. * Fix UpdateNps to actually smooth the nps and correctly handle time_since_movestart_ms == 0 (LeelaChessZero#1243) * Update changelog for 0.25.0 final release. (LeelaChessZero#1244) * Always report at least 1 depth. (LeelaChessZero#1247) * Fix un-intended regression for GTX GPUs (LeelaChessZero#1246) * memory optimization for cudnn custom_winograd (LeelaChessZero#1250) * memory optimization for cudnn custom_winograd - don't save untransformed weights - print warning message when low memory is detected. * address review comments * fix warning message * fix total weight size calculation 2 layers per residual block! * keep pdb files only for release builds (LeelaChessZero#1256) * doc update (LeelaChessZero#1267) * Include verbose stats for the node. (LeelaChessZero#1268) Use printing lambdas for parts of the verbose output to share between the newly outputted node and its children. * add alphazero time manager (LeelaChessZero#1201) * Updated FLAGS.md with logfile flag (LeelaChessZero#1275) * Fixed a typo in CONTRIBUTING.md (LeelaChessZero#1274) * Update Readme about using git (LeelaChessZero#1265) * Make `wl_` double. (LeelaChessZero#1280) * Move move filter population to a constructor. (LeelaChessZero#1281) * Filter out illegal searchmoves to avoid crashing. (LeelaChessZero#1282) * Clear policy for terminal loss. (LeelaChessZero#1285) * Allow smart pruning to terminate search if win is known. (LeelaChessZero#1284) * Allow smart pruning to terminate search if win is known. * Minor tweak, better safe than sorry. * Fix bug where pv might not update for best move change. (LeelaChessZero#1286) * Fix bug where pv might not update. * Fix... Co-authored-by: Alexander Lyashuk <[email protected]> Co-authored-by: Tilps <[email protected]> Co-authored-by: Naphthalin <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Ed Lee <[email protected]> Co-authored-by: borg323 <[email protected]> Co-authored-by: Hace <[email protected]> Co-authored-by: Kip Hamiltons <[email protected]> Co-authored-by: nguyenpham <[email protected]>

@Naphthalin

* Time management refactoring (#1195) * Appended files. * Compiles. * Compiles again. * Make smart pruning use smoothed nps. * Seems to be fully implemented. * Mistype. * One more bug. * Found discrepancy with documentaiton. * Bugfixes. * Don't smooth nps during the first move. * Too large default for timeuse decay. * Bugfix. * Fix build. * Relax defaults a bit. Add fixed to logging. * Remove "smooth" to "smooth-experimental" for now. * MLH verbose stats - Issue 1200 (#1230) * Add M effect logic to output section * Fix missing prefixes and semicolons * Some fixes. * Slight format improvement? Co-authored-by: Tilps <[email protected]> * Start TempDecay only after a given number of moves (#1212) * Added TempDecayStartMove for starting temp decay only after a given number of moves. This allows keeping initial game up for a few moves and still use decay. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Hide temp options * renamed TempDecayStartMove to TempDecayDelayMoves Co-authored-by: Alexis Olson <[email protected]> * Changelog for 0.25.0-rc2. (#1233) * Changelog for 0.25.0-rc2. * Add one more PR to the changelog. * Cuda winograd (#1228) * custom winograd convolution for cuda backends * custom winograd fixes - fix a bug to make it work for non-SE networks - enable by default only with fp32. * address review comments * remove random line in comment * remove unused constants - W,H are hardcoded to 8 - because there are assumptions in the code based on that. No point in defining constants. * cuda winograd fixes (#1238) * cuda winograd fixes - don't typecast directly to half datatype in CPU side code as older CUDA runtime doesn't support that. - don't use gemmEx version on GPUs older than Maxwell generation (not supported). - modify the check to enable custom_winograd setting. It should be faster in most cases - except presently on RTX GPUs when using fp16. * Allow most parts of fen to be optional. (#1234) Default to white to move, no castling, no en passant, 0 rule50ply, 1 total move. Also convert other string to std::string and removing using. * Fix UpdateNps to actually smooth the nps and correctly handle time_since_movestart_ms == 0 (#1243) * Update changelog for 0.25.0 final release. (#1244) * Always report at least 1 depth. (#1247) * Fix un-intended regression for GTX GPUs (#1246) * memory optimization for cudnn custom_winograd (#1250) * memory optimization for cudnn custom_winograd - don't save untransformed weights - print warning message when low memory is detected. * address review comments * fix warning message * fix total weight size calculation 2 layers per residual block! * keep pdb files only for release builds (#1256) * doc update (#1267) * Include verbose stats for the node. (#1268) Use printing lambdas for parts of the verbose output to share between the newly outputted node and its children. * add alphazero time manager (#1201) * Updated FLAGS.md with logfile flag (#1275) * Fixed a typo in CONTRIBUTING.md (#1274) * Update Readme about using git (#1265) * Make `wl_` double. (#1280) * Move move filter population to a constructor. (#1281) * Filter out illegal searchmoves to avoid crashing. (#1282) * Clear policy for terminal loss. (#1285) * Allow smart pruning to terminate search if win is known. (#1284) * Allow smart pruning to terminate search if win is known. * Minor tweak, better safe than sorry. * Fix bug where pv might not update for best move change. (#1286) * Fix bug where pv might not update. * Fix... * Catch up to master (#6) * Time management refactoring (#1195) * Appended files. * Compiles. * Compiles again. * Make smart pruning use smoothed nps. * Seems to be fully implemented. * Mistype. * One more bug. * Found discrepancy with documentaiton. * Bugfixes. * Don't smooth nps during the first move. * Too large default for timeuse decay. * Bugfix. * Fix build. * Relax defaults a bit. Add fixed to logging. * Remove "smooth" to "smooth-experimental" for now. * MLH verbose stats - Issue 1200 (#1230) * Add M effect logic to output section * Fix missing prefixes and semicolons * Some fixes. * Slight format improvement? Co-authored-by: Tilps <[email protected]> * Start TempDecay only after a given number of moves (#1212) * Added TempDecayStartMove for starting temp decay only after a given number of moves. This allows keeping initial game up for a few moves and still use decay. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Hide temp options * renamed TempDecayStartMove to TempDecayDelayMoves Co-authored-by: Alexis Olson <[email protected]> * Changelog for 0.25.0-rc2. (#1233) * Changelog for 0.25.0-rc2. * Add one more PR to the changelog. * Cuda winograd (#1228) * custom winograd convolution for cuda backends * custom winograd fixes - fix a bug to make it work for non-SE networks - enable by default only with fp32. * address review comments * remove random line in comment * remove unused constants - W,H are hardcoded to 8 - because there are assumptions in the code based on that. No point in defining constants. * cuda winograd fixes (#1238) * cuda winograd fixes - don't typecast directly to half datatype in CPU side code as older CUDA runtime doesn't support that. - don't use gemmEx version on GPUs older than Maxwell generation (not supported). - modify the check to enable custom_winograd setting. It should be faster in most cases - except presently on RTX GPUs when using fp16. * Allow most parts of fen to be optional. (#1234) Default to white to move, no castling, no en passant, 0 rule50ply, 1 total move. Also convert other string to std::string and removing using. * Fix UpdateNps to actually smooth the nps and correctly handle time_since_movestart_ms == 0 (#1243) * Update changelog for 0.25.0 final release. (#1244) * Always report at least 1 depth. (#1247) * Fix un-intended regression for GTX GPUs (#1246) * memory optimization for cudnn custom_winograd (#1250) * memory optimization for cudnn custom_winograd - don't save untransformed weights - print warning message when low memory is detected. * address review comments * fix warning message * fix total weight size calculation 2 layers per residual block! * keep pdb files only for release builds (#1256) * doc update (#1267) * Include verbose stats for the node. (#1268) Use printing lambdas for parts of the verbose output to share between the newly outputted node and its children. * add alphazero time manager (#1201) * Updated FLAGS.md with logfile flag (#1275) * Fixed a typo in CONTRIBUTING.md (#1274) * Update Readme about using git (#1265) * Make `wl_` double. (#1280) * Move move filter population to a constructor. (#1281) * Filter out illegal searchmoves to avoid crashing. (#1282) * Clear policy for terminal loss. (#1285) * Allow smart pruning to terminate search if win is known. (#1284) * Allow smart pruning to terminate search if win is known. * Minor tweak, better safe than sorry. * Fix bug where pv might not update for best move change. (#1286) * Fix bug where pv might not update. * Fix... Co-authored-by: Alexander Lyashuk <[email protected]> Co-authored-by: Tilps <[email protected]> Co-authored-by: Naphthalin <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Ed Lee <[email protected]> Co-authored-by: borg323 <[email protected]> Co-authored-by: Hace <[email protected]> Co-authored-by: Kip Hamiltons <[email protected]> Co-authored-by: nguyenpham <[email protected]> * Change defaults and unhide MLH options * Update values per @Naphthalin's comments Co-authored-by: Alexander Lyashuk <[email protected]> Co-authored-by: Tilps <[email protected]> Co-authored-by: Naphthalin <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Ed Lee <[email protected]> Co-authored-by: borg323 <[email protected]> Co-authored-by: Hace <[email protected]> Co-authored-by: Kip Hamiltons <[email protected]> Co-authored-by: nguyenpham <[email protected]>

mooskagh added 2 commits May 9, 2020 12:20

Move move filter population to a constructor.

2c58569

Initialize TB_hits to 1 if root position is a TB hit.

2ef7171

mooskagh requested a review from Tilps May 9, 2020 10:58

mooskagh marked this pull request as ready for review May 9, 2020 10:58

Tilps reviewed May 9, 2020

View reviewed changes

Tilps approved these changes May 9, 2020

View reviewed changes

Move indirection from inside the loop.

a4bbcf6

Forgot to delete something.

bde0357

mooskagh merged commit 0622049 into LeelaChessZero:master May 9, 2020

mooskagh deleted the tb_performance branch May 9, 2020 17:33

Mardak mentioned this pull request May 9, 2020

Filter out illegal searchmoves. #1282

Merged

AlexisOlson pushed a commit to AlexisOlson/lc0 that referenced this pull request May 10, 2020

Move move filter population to a constructor. (LeelaChessZero#1281)

d5c9328

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move move filter population to a constructor. #1281

Move move filter population to a constructor. #1281

mooskagh commented May 9, 2020 •

edited

Loading

Tilps May 9, 2020

mooskagh May 9, 2020

Tilps commented May 9, 2020

Tilps left a comment

mooskagh commented May 9, 2020

Tilps commented May 9, 2020

Naphthalin commented May 9, 2020 •

edited

Loading

mooskagh commented May 9, 2020 •

edited

Loading

Tilps commented May 10, 2020

Tilps commented May 10, 2020

Move move filter population to a constructor. #1281

Move move filter population to a constructor. #1281

Conversation

mooskagh commented May 9, 2020 • edited Loading

Tilps May 9, 2020

Choose a reason for hiding this comment

mooskagh May 9, 2020

Choose a reason for hiding this comment

Tilps commented May 9, 2020

Tilps left a comment

Choose a reason for hiding this comment

mooskagh commented May 9, 2020

Tilps commented May 9, 2020

Naphthalin commented May 9, 2020 • edited Loading

mooskagh commented May 9, 2020 • edited Loading

Tilps commented May 10, 2020

Tilps commented May 10, 2020

mooskagh commented May 9, 2020 •

edited

Loading

Naphthalin commented May 9, 2020 •

edited

Loading

mooskagh commented May 9, 2020 •

edited

Loading