Skip to content

Commit

Permalink
Remove classical evaluation
Browse files Browse the repository at this point in the history
since the introduction of NNUE (first released with Stockfish 12), we
have maintained the classical evaluation as part of SF in frozen form.
The idea that this code could lead to further inputs to the NN or
search did not materialize. Now, after five releases, this PR removes
the classical evaluation from SF. Even though this evaluation is
probably the best of its class, it has become unimportant for the
engine's strength, and there is little need to maintain this
code (roughly 25% of SF) going forward, or to expend resources on
trying to improve its integration in the NNUE eval.

Indeed, it had still a very limited use in the current SF, namely
for the evaluation of positions that are nearly decided based on
material difference, where the speed of the classical evaluation
outweights its inaccuracies. This impact on strength is small,
roughly 2Elo, and probably decreasing in importance as the TC grows.

Potentially, removal of this code could lead to the development of
techniques to have faster, but less accurate NN evaluation,
for certain positions.

STC
https://tests.stockfishchess.org/tests/view/64a320173ee09aa549c52157
Elo: -2.35 ± 1.1 (95%) LOS: 0.0%
Total: 100000 W: 24916 L: 25592 D: 49492
Ptnml(0-2): 287, 12123, 25841, 11477, 272
nElo: -4.62 ± 2.2 (95%) PairsRatio: 0.95

LTC
https://tests.stockfishchess.org/tests/view/64a320293ee09aa549c5215b
 Elo: -1.74 ± 1.0 (95%) LOS: 0.0%
Total: 100000 W: 25010 L: 25512 D: 49478
Ptnml(0-2): 44, 11069, 28270, 10579, 38
nElo: -3.72 ± 2.2 (95%) PairsRatio: 0.96

VLTC SMP
https://tests.stockfishchess.org/tests/view/64a3207c3ee09aa549c52168
 Elo: -1.70 ± 0.9 (95%) LOS: 0.0%
Total: 100000 W: 25673 L: 26162 D: 48165
Ptnml(0-2): 8, 9455, 31569, 8954, 14
nElo: -3.95 ± 2.2 (95%) PairsRatio: 0.95

closes #4674

Bench: 1444646
  • Loading branch information
vondele committed Jul 11, 2023
1 parent 6a8767a commit af110e0
Show file tree
Hide file tree
Showing 16 changed files with 37 additions and 2,745 deletions.
4 changes: 2 additions & 2 deletions src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@ else
endif

### Source and object files
SRCS = benchmark.cpp bitbase.cpp bitboard.cpp endgame.cpp evaluate.cpp main.cpp \
material.cpp misc.cpp movegen.cpp movepick.cpp pawns.cpp position.cpp psqt.cpp \
SRCS = benchmark.cpp bitboard.cpp evaluate.cpp main.cpp \
misc.cpp movegen.cpp movepick.cpp position.cpp psqt.cpp \
search.cpp thread.cpp timeman.cpp tt.cpp uci.cpp ucioption.cpp tune.cpp syzygy/tbprobe.cpp \
nnue/evaluate_nnue.cpp nnue/features/half_ka_v2_hm.cpp

Expand Down
9 changes: 0 additions & 9 deletions src/benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -153,24 +153,15 @@ vector<string> setup_bench(const Position& current, istream& is) {
list.emplace_back("setoption name Hash value " + ttSize);
list.emplace_back("ucinewgame");

size_t posCounter = 0;

for (const string& fen : fens)
if (fen.find("setoption") != string::npos)
list.emplace_back(fen);
else
{
if (evalType == "classical" || (evalType == "mixed" && posCounter % 2 == 0))
list.emplace_back("setoption name Use NNUE value false");
else if (evalType == "NNUE" || (evalType == "mixed" && posCounter % 2 != 0))
list.emplace_back("setoption name Use NNUE value true");
list.emplace_back("position fen " + fen);
list.emplace_back(go);
++posCounter;
}

list.emplace_back("setoption name Use NNUE value true");

return list;
}

Expand Down
172 changes: 0 additions & 172 deletions src/bitbase.cpp

This file was deleted.

7 changes: 0 additions & 7 deletions src/bitboard.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,6 @@

namespace Stockfish {

namespace Bitbases {

void init();
bool probe(Square wksq, Square wpsq, Square bksq, Color us);

} // namespace Stockfish::Bitbases

namespace Bitboards {

void init();
Expand Down
Loading

12 comments on commit af110e0

@PaulJeFi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite sad but comprehensible. The classical evaluation was a great source of inspiration for developers, but it is understandable that preserve it in the code is useless.

@amchess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad idea of the community: removal of the classical evaluation function.
It is indeed true that it is less strong than nnue, but, in reference to a human being,
-it is definitely, better
-is exploitable to adapt it to his level of play, since its elements are mappable to those found in traditional strategy books.
Basically, comparing classic Stockfish with nnue, you can see how far a human being can go and where the "inhuman" takes over.
Now, this genius makes that impossible.
This shows that the community members only care about pure game power, moreover, essentially, at ultra-rapid times.
This is of no help to the OTB player, indeed detrimental.
nnue evaluation function is a blackbox.
As such, you can't act on it: you can think of it as a linear combination of thousands of factors and relative weights both unknown.
In contrast, the classical evaluation function, certainly less strong, but better than any human being, has a dozen that can be mapped exactly to the thinking system of a human being.
Thus, such factors can be turned on or off as needed.
What we do and will continue to do with ShashChess with a REAL HANDICAP MODE simulating a human player.
What sense does it make for a human being to play the moves of an entity that often violates the strategic principles it already knows? Instead, the comparison with the classical evaluation function allows him to penetrate the meanders of nnue.
So instead the computer speaks Arabic even for Carlsen.
Research is meaningless unless it makes people's lives better: pure hypertrophic ego.
Now, the evaluation function is not at all dependent on the classical evaluation function.
Up to there, I'm all for it, but why remove it as an additional option?
The current stockfish handicap mode is ridiculous: random errors the more frequent the lower the elo.
Instead, a real handicap mode should simulate the thinking system of a certain elo range, which is impossible now for Stockfish.

@Vizvezdenec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ur dad

@lucametehau
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad idea of the community: removal of the classical evaluation function. It is indeed true that it is less strong than nnue, but, in reference to a human being, -it is definitely, better -is exploitable to adapt it to his level of play, since its elements are mappable to those found in traditional strategy books. Basically, comparing classic Stockfish with nnue, you can see how far a human being can go and where the "inhuman" takes over. Now, this genius makes that impossible. This shows that the community members only care about pure game power, moreover, essentially, at ultra-rapid times. This is of no help to the OTB player, indeed detrimental. nnue evaluation function is a blackbox. As such, you can't act on it: you can think of it as a linear combination of thousands of factors and relative weights both unknown. In contrast, the classical evaluation function, certainly less strong, but better than any human being, has a dozen that can be mapped exactly to the thinking system of a human being. Thus, such factors can be turned on or off as needed. What we do and will continue to do with ShashChess with a REAL HANDICAP MODE simulating a human player. What sense does it make for a human being to play the moves of an entity that often violates the strategic principles it already knows? Instead, the comparison with the classical evaluation function allows him to penetrate the meanders of nnue. So instead the computer speaks Arabic even for Carlsen. Research is meaningless unless it makes people's lives better: pure hypertrophic ego. Now, the evaluation function is not at all dependent on the classical evaluation function. Up to there, I'm all for it, but why remove it as an additional option? The current stockfish handicap mode is ridiculous: random errors the more frequent the lower the elo. Instead, a real handicap mode should simulate the thinking system of a certain elo range, which is impossible now for Stockfish.

Thats a big chunk of nothing

@amchess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, Stockfish still has its handicap mode, completely random and experimentally not reflecting a player's elo in any way.
Why keep it, consistent with the choice to approve this pull request?

@AlexandreMasta
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SF´s play level should be just removed as well. Let the GUI do the job. Is a chunk of code that is not the objective of the engine. The objective of the engine is to play the best moves and win, period.

@amchess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amchess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SF´s play level should be just removed as well. Let the GUI do the job. Is a chunk of code that is not the objective of the engine. The objective of the engine is to play the best moves and win, period.

I agree that, for consistency, the current, admittedly ridiculous, handicap mode should also be eliminated.
However, I do not agree that a gui should deal with a REAL handicap mode, i.e. one that simulates the thinking system of a player. This is only possible

  1. in a NNUE approach, as is done in the MAIA project: using networks for elo bands;
  2. in a classical approach, turning classical factors on or off depending on the elo band.
    I prefer 2. because it has finer granularity and requires much less computational resources and time.

@cj5716
Copy link
Contributor

@cj5716 cj5716 commented on af110e0 Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, even an extremely simple eval (aka simpleEval) is still extremely strong and can easily defeat many human players, so if you want an actually accurate skill level you will need to compromise search as well. For shashchess' skill level, have you tested it in order to ensure that it is accurate?

@amchess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, even an extremely simple eval (aka simpleEval) is still extremely strong and can easily defeat many human players, so if you want an actually accurate skill level you will need to compromise search as well. For shashchess' skill level, have you tested it in order to ensure that it is accurate?

see here
In general, nnue has flattened the level of play, and I don't really enjoy programming with the obsession of being number one.
I enjoy much more trying to help the practical player improve, in particular, to penetrate more into the mysteries of this game.
If the Stockfish people have fun like that, I respect them, but, to me, scientific research is meaningless if it doesn't make people's lives better.

@cj5716
Copy link
Contributor

@cj5716 cj5716 commented on af110e0 Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO using just 40 such positions is not enough, maybe something like 10,000 positions will be more accurate
Your opinion is fine, I wish you success in your exploits, however I think it shouldn't affect SF (the parent project)

@amchess
Copy link

@amchess amchess commented on af110e0 Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but it is already something since the positions are from human experience and we believe that technology should help people.
In fact, we have developed a batch application that extracts from a database of very high quality matches all those in which Shashchess provides all different moves at the 4 different elo bands + the nnue.
There are so many of them and we had planned to categorize them by phase of the game (opening, transition to midgame, midgame, transition to ending, and ending): 40 for each for a total of 200.
By submitting them to players of different elo we will have feedback.
This field of research is really imho much more fascinating and of practical use. The goal is to "understand chess".

Please sign in to comment.