Remove classical evaluation #4674

vondele · 2023-07-11T15:07:02Z

since the introduction of NNUE (first released with Stockfish 12), we have maintained the classical evaluation as part of SF in frozen form. The idea that this code could lead to further inputs to the NN or search did not materialize. Now, after five releases, this PR removes the classical evaluation from SF. Even though this evaluation is probably the best of its class, it has become unimportant for the engine's strength, and there is little need to maintain this code (roughly 25% of SF) going forward, or to expend resources on trying to improve its integration in the NNUE eval.

Indeed, it had still a very limited use in the current SF, namely for the evaluation of positions that are nearly decided based on material difference, where the speed of the classical evaluation outweights its inaccuracies. This impact on strength is small, roughly 2Elo, and probably decreasing in importance as the TC grows.

Potentially, removal of this code could lead to the development of techniques to have faster, but less accurate NN evaluation, for certain positions.

STC
https://tests.stockfishchess.org/tests/view/64a320173ee09aa549c52157 Elo: -2.35 ± 1.1 (95%) LOS: 0.0%
Total: 100000 W: 24916 L: 25592 D: 49492
Ptnml(0-2): 287, 12123, 25841, 11477, 272
nElo: -4.62 ± 2.2 (95%) PairsRatio: 0.95

LTC
https://tests.stockfishchess.org/tests/view/64a320293ee09aa549c5215b
Elo: -1.74 ± 1.0 (95%) LOS: 0.0%
Total: 100000 W: 25010 L: 25512 D: 49478
Ptnml(0-2): 44, 11069, 28270, 10579, 38
nElo: -3.72 ± 2.2 (95%) PairsRatio: 0.96

VLTC SMP
https://tests.stockfishchess.org/tests/view/64a3207c3ee09aa549c52168
Elo: -1.70 ± 0.9 (95%) LOS: 0.0%
Total: 100000 W: 25673 L: 26162 D: 48165
Ptnml(0-2): 8, 9455, 31569, 8954, 14
nElo: -3.95 ± 2.2 (95%) PairsRatio: 0.95

Bench: 1590774

since the introduction of NNUE (first released with Stockfish 12), we have maintained the classical evaluation as part of SF in frozen form. The idea that this code could lead to further inputs to the NN or search did not materialize. Now, after five releases, this PR removes the classical evaluation from SF. Even though this evaluation is probably the best of its class, it has become unimportant for the engine's strength, and there is little need to maintain this code (roughly 25% of SF) going forward, or to expend resources on trying to improve its integration in the NNUE eval. Indeed, it had still a very limited use in the current SF, namely for the evaluation of positions that are nearly decided based on material difference, where the speed of the classical evaluation outweights its inaccuracies. This impact on strength is small, roughly 2Elo, and probably decreasing in importance as the TC grows. Potentially, removal of this code could lead to the development of techniques to have faster, but less accurate NN evaluation, for certain positions. STC https://tests.stockfishchess.org/tests/view/64a320173ee09aa549c52157 Elo: -2.35 ± 1.1 (95%) LOS: 0.0% Total: 100000 W: 24916 L: 25592 D: 49492 Ptnml(0-2): 287, 12123, 25841, 11477, 272 nElo: -4.62 ± 2.2 (95%) PairsRatio: 0.95 LTC https://tests.stockfishchess.org/tests/view/64a320293ee09aa549c5215b Elo: -1.74 ± 1.0 (95%) LOS: 0.0% Total: 100000 W: 25010 L: 25512 D: 49478 Ptnml(0-2): 44, 11069, 28270, 10579, 38 nElo: -3.72 ± 2.2 (95%) PairsRatio: 0.96 VLTC SMP https://tests.stockfishchess.org/tests/view/64a3207c3ee09aa549c52168 Elo: -1.70 ± 0.9 (95%) LOS: 0.0% Total: 100000 W: 25673 L: 26162 D: 48165 Ptnml(0-2): 8, 9455, 31569, 8954, 14 nElo: -3.95 ± 2.2 (95%) PairsRatio: 0.95 Bench: 1590774

jhellis3 · 2023-07-11T17:48:43Z

Obviously I support this, but can we create a classical branch for posterity or even academic purposes should people feel inclined? One of the original purposes of Stockfish was to serve as an example. There is great deal of knowledge in those files, which should be preserved in an easily accessible way IMHO.

Disservin · 2023-07-11T17:52:08Z

Obviously I support this, but can we create a classical branch for posterity or even academic purposes should people feel inclined? One of the original purposes of Stockfish was to serve as an example. There is great deal of knowledge in those files, which should be preserved in an easily accessible way IMHO.

I think a dedicated tag would fit in this case.

jhellis3 · 2023-07-11T17:57:02Z

A branch would be nicer.

vondele · 2023-07-11T17:57:13Z

There is a tag SF_classical that can be used to get the last classical version of SF.
git switch --detach SF_classical

jhellis3 · 2023-07-11T17:59:31Z

And if someone wants to make a pull request to classical?

vondele · 2023-07-11T18:00:34Z

it was frozen already, it wouldn't be merged

jhellis3 · 2023-07-11T18:01:10Z

How can a branch that doesn't exist yet be frozen? And who voted on that?

Disservin · 2023-07-11T18:02:11Z

Classical was frozen not the non existing branch? From this on classical wont be maintained anymore..

vondele · 2023-07-11T18:02:18Z

patches to the classical code have been rejected for the last few years, if that's clearer.

jhellis3 · 2023-07-11T18:02:51Z

No, that did not answer my questions.

mstembera · 2023-07-11T19:43:29Z

We never officially released the strongest classical version. Since it's about to be completely removed IMO it would be nice if we did so now. If we can agree on this, I volunteer to run a small number of tests to figure out exactly what the strongest classical version was along the lines of #3986 (comment). @vondele would you be ok w/ this?

vondele · 2023-07-11T20:02:00Z

I think there is no point in doing so. Strongest released classical engine is tagged as sf_11, strongest developed version is tagged as SF_classical. If it was stronger at a later point that's a property of search, not of the evaluation.

mstembera · 2023-07-11T20:12:47Z

Right. The strongest classical is tagged but never released which is what I'm asking for. W/o released binaries it only exists to us devs but not the chess community at large.

vondele · 2023-07-11T20:18:18Z

I agree SF_classical has never been released. But there is no point in releasing it now, for the typical user this is just a weak engine. For a developer, who might want to have a look at the code, it is easily available.

since the introduction of NNUE (first released with Stockfish 12), we have maintained the classical evaluation as part of SF in frozen form. The idea that this code could lead to further inputs to the NN or search did not materialize. Now, after five releases, this PR removes the classical evaluation from SF. Even though this evaluation is probably the best of its class, it has become unimportant for the engine's strength, and there is little need to maintain this code (roughly 25% of SF) going forward, or to expend resources on trying to improve its integration in the NNUE eval. Indeed, it had still a very limited use in the current SF, namely for the evaluation of positions that are nearly decided based on material difference, where the speed of the classical evaluation outweights its inaccuracies. This impact on strength is small, roughly 2Elo, and probably decreasing in importance as the TC grows. Potentially, removal of this code could lead to the development of techniques to have faster, but less accurate NN evaluation, for certain positions. STC https://tests.stockfishchess.org/tests/view/64a320173ee09aa549c52157 Elo: -2.35 ± 1.1 (95%) LOS: 0.0% Total: 100000 W: 24916 L: 25592 D: 49492 Ptnml(0-2): 287, 12123, 25841, 11477, 272 nElo: -4.62 ± 2.2 (95%) PairsRatio: 0.95 LTC https://tests.stockfishchess.org/tests/view/64a320293ee09aa549c5215b Elo: -1.74 ± 1.0 (95%) LOS: 0.0% Total: 100000 W: 25010 L: 25512 D: 49478 Ptnml(0-2): 44, 11069, 28270, 10579, 38 nElo: -3.72 ± 2.2 (95%) PairsRatio: 0.96 VLTC SMP https://tests.stockfishchess.org/tests/view/64a3207c3ee09aa549c52168 Elo: -1.70 ± 0.9 (95%) LOS: 0.0% Total: 100000 W: 25673 L: 26162 D: 48165 Ptnml(0-2): 8, 9455, 31569, 8954, 14 nElo: -3.95 ± 2.2 (95%) PairsRatio: 0.95 closes official-stockfish#4674 Bench: 1444646

ChessOverflow · 2023-07-17T05:11:48Z

Related to discussion #4678 and at @cj5716's request, I'm putting here the positions that the NNUE misevaluates.

Base of NNUE evaluation, White is victorious.
Base of Classic evaluation, Black is victorious.

Current FEN is 8/p4p1p/5kp1/1p6/8/3b2P1/PPp2P1P/2R3K1 w - - 0 34

cj5716 · 2023-07-17T07:21:51Z

Related to discussion #4678 and at @cj5716's request, I'm putting here the positions that the NNUE misevaluates.

Base of NNUE evaluation, White is victorious.

Base of Classic evaluation, Black is victorious.

Current FEN is 8/p4p1p/5kp1/1p6/8/3b2P1/PPp2P1P/2R3K1 w - - 0 34

do you have a decisive answer as to which side is winning? How do you know that it is not HCE that misevaluates it?
homefish depth 47 search score reports back -439cp.

ChessOverflow · 2023-07-17T17:59:55Z

do you have a decisive answer as to which side is winning?

Yes.. black is clearly the winner. see my last comment in #4678.

vondele added 2 commits July 11, 2023 17:04

fixup

e284f17

vondele added the to be merged Will be merged shortly label Jul 11, 2023

vondele closed this in af110e0 Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove classical evaluation #4674

Remove classical evaluation #4674

vondele commented Jul 11, 2023

jhellis3 commented Jul 11, 2023

Disservin commented Jul 11, 2023

jhellis3 commented Jul 11, 2023

vondele commented Jul 11, 2023 •

edited

Loading

jhellis3 commented Jul 11, 2023

vondele commented Jul 11, 2023

jhellis3 commented Jul 11, 2023

Disservin commented Jul 11, 2023

vondele commented Jul 11, 2023

jhellis3 commented Jul 11, 2023

mstembera commented Jul 11, 2023

vondele commented Jul 11, 2023

mstembera commented Jul 11, 2023

vondele commented Jul 11, 2023

ChessOverflow commented Jul 17, 2023 •

edited

Loading

cj5716 commented Jul 17, 2023 •

edited

Loading

ChessOverflow commented Jul 17, 2023 •

edited

Loading

Remove classical evaluation #4674

Remove classical evaluation #4674

Conversation

vondele commented Jul 11, 2023

jhellis3 commented Jul 11, 2023

Disservin commented Jul 11, 2023

jhellis3 commented Jul 11, 2023

vondele commented Jul 11, 2023 • edited Loading

jhellis3 commented Jul 11, 2023

vondele commented Jul 11, 2023

jhellis3 commented Jul 11, 2023

Disservin commented Jul 11, 2023

vondele commented Jul 11, 2023

jhellis3 commented Jul 11, 2023

mstembera commented Jul 11, 2023

vondele commented Jul 11, 2023

mstembera commented Jul 11, 2023

vondele commented Jul 11, 2023

ChessOverflow commented Jul 17, 2023 • edited Loading

cj5716 commented Jul 17, 2023 • edited Loading

ChessOverflow commented Jul 17, 2023 • edited Loading

vondele commented Jul 11, 2023 •

edited

Loading

ChessOverflow commented Jul 17, 2023 •

edited

Loading

cj5716 commented Jul 17, 2023 •

edited

Loading

ChessOverflow commented Jul 17, 2023 •

edited

Loading