v0.31.0
In this version:
- The blas, cuda, eigen, metal and onnx backends now have support for multihead network architecture and can run BT3/BT4 nets.
- Updated the internal Elo model to better align with regular Elo for human players.
- There is a new XLA backend that uses OpenXLA compiler to produce code to execute the neural network. See https://github.com/LeelaChessZero/lc0/wiki/XLA-backend for details. Related are new leela2onnx options to output the HLO format that XLA understands.
- There is a vastly simplified lc0 interface available by renaming the executable to
lc0simple
. - The backends can now suggest a minibatch size to the search, this is enabled by
--minibatch-size=0
(the new default). - If the cudnn backend detected an unsupported network architecture it will switch to the cuda backend.
- Two new selfplay options enable value and policy tournaments. A policy tournament is using a single node policy to select the move to play, while a value tournament searches all possible moves at depth 1 to select the one with the best q.
- While it is easy to get a single node policy evaluation (
go nodes 1
using uci), there was no simple way to get the effect of a value only evalaution, so the--value-only
option was added. - Button uci options were implemented and a button to clear the tree was added (as hidden option).
- Support for the uci
go mate
option was added. - The rescorer can now be built from the lc0 code base instead of a separate branch.
- A dicrete onnx layernorm implementation was added to get around a onnxruntime bug with directml - this has some overhead so it is only enabled for onnx-dml and can be switched off with the
alt_layernorm=false
backend option. - The
--onnx2pytoch
option was added to leela2onnx to generate pytorch compatible models. - There is a cuda
min_batch
backend option to reduce non-determinism with small batches. - New options were added to onnx2leela to fix tf exported onnx models.
- The onnx backend can now be built for amd's rocm.
- Fixed a bug where the Contempt effect on eval was too low for nets with natively higher draw rates.
- Made the WDL Rescale sharpness limit configurable via the
--wdl-max-s
hidden option. - The search task workers can be set automatically, to either 0 for cpu backends or up to 4 depending on the number of cpu cores. This is enabled by
--task-workers=-1
(the new default). - Changed cuda compilation options to use
-arch=native
or-arch=all-major
if no specific version is requested, with fallback for older cuda that don't support those options. - Updated android builds to use openblas 0.3.27.
- The
WDLDrawRateTarget
option now accepts the value 0 (new default) to retain raw WDL values ifWDLCalibrationElo
is set to 0 (default). - Improvements to the verbose move stats if `WDLEvalObjectivity is used.
- The centipawn score is displayed by default for old nets without WDL output.
- Several assorted fixes and code cleanups.