Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
since the introduction of NNUE (first released with Stockfish 12), we have maintained the classical evaluation as part of SF in frozen form. The idea that this code could lead to further inputs to the NN or search did not materialize. Now, after five releases, this PR removes the classical evaluation from SF. Even though this evaluation is probably the best of its class, it has become unimportant for the engine's strength, and there is little need to maintain this code (roughly 25% of SF) going forward, or to expend resources on trying to improve its integration in the NNUE eval. Indeed, it had still a very limited use in the current SF, namely for the evaluation of positions that are nearly decided based on material difference, where the speed of the classical evaluation outweights its inaccuracies. This impact on strength is small, roughly 2Elo, and probably decreasing in importance as the TC grows. Potentially, removal of this code could lead to the development of techniques to have faster, but less accurate NN evaluation, for certain positions. STC https://tests.stockfishchess.org/tests/view/64a320173ee09aa549c52157 Elo: -2.35 ± 1.1 (95%) LOS: 0.0% Total: 100000 W: 24916 L: 25592 D: 49492 Ptnml(0-2): 287, 12123, 25841, 11477, 272 nElo: -4.62 ± 2.2 (95%) PairsRatio: 0.95 LTC https://tests.stockfishchess.org/tests/view/64a320293ee09aa549c5215b Elo: -1.74 ± 1.0 (95%) LOS: 0.0% Total: 100000 W: 25010 L: 25512 D: 49478 Ptnml(0-2): 44, 11069, 28270, 10579, 38 nElo: -3.72 ± 2.2 (95%) PairsRatio: 0.96 VLTC SMP https://tests.stockfishchess.org/tests/view/64a3207c3ee09aa549c52168 Elo: -1.70 ± 0.9 (95%) LOS: 0.0% Total: 100000 W: 25673 L: 26162 D: 48165 Ptnml(0-2): 8, 9455, 31569, 8954, 14 nElo: -3.95 ± 2.2 (95%) PairsRatio: 0.95 closes #4674 Bench: 1444646
- Loading branch information
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite sad but comprehensible. The classical evaluation was a great source of inspiration for developers, but it is understandable that preserve it in the code is useless.
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad idea of the community: removal of the classical evaluation function.
It is indeed true that it is less strong than nnue, but, in reference to a human being,
-it is definitely, better
-is exploitable to adapt it to his level of play, since its elements are mappable to those found in traditional strategy books.
Basically, comparing classic Stockfish with nnue, you can see how far a human being can go and where the "inhuman" takes over.
Now, this genius makes that impossible.
This shows that the community members only care about pure game power, moreover, essentially, at ultra-rapid times.
This is of no help to the OTB player, indeed detrimental.
nnue evaluation function is a blackbox.
As such, you can't act on it: you can think of it as a linear combination of thousands of factors and relative weights both unknown.
In contrast, the classical evaluation function, certainly less strong, but better than any human being, has a dozen that can be mapped exactly to the thinking system of a human being.
Thus, such factors can be turned on or off as needed.
What we do and will continue to do with ShashChess with a REAL HANDICAP MODE simulating a human player.
What sense does it make for a human being to play the moves of an entity that often violates the strategic principles it already knows? Instead, the comparison with the classical evaluation function allows him to penetrate the meanders of nnue.
So instead the computer speaks Arabic even for Carlsen.
Research is meaningless unless it makes people's lives better: pure hypertrophic ego.
Now, the evaluation function is not at all dependent on the classical evaluation function.
Up to there, I'm all for it, but why remove it as an additional option?
The current stockfish handicap mode is ridiculous: random errors the more frequent the lower the elo.
Instead, a real handicap mode should simulate the thinking system of a certain elo range, which is impossible now for Stockfish.
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ur dad
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a big chunk of nothing
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, Stockfish still has its handicap mode, completely random and experimentally not reflecting a player's elo in any way.
Why keep it, consistent with the choice to approve this pull request?
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SF´s play level should be just removed as well. Let the GUI do the job. Is a chunk of code that is not the objective of the engine. The objective of the engine is to play the best moves and win, period.
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/amchess/ShashChess/wiki/HandicapMode
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that, for consistency, the current, admittedly ridiculous, handicap mode should also be eliminated.
However, I do not agree that a gui should deal with a REAL handicap mode, i.e. one that simulates the thinking system of a player. This is only possible
I prefer 2. because it has finer granularity and requires much less computational resources and time.
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, even an extremely simple eval (aka simpleEval) is still extremely strong and can easily defeat many human players, so if you want an actually accurate skill level you will need to compromise search as well. For shashchess' skill level, have you tested it in order to ensure that it is accurate?
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see here
In general, nnue has flattened the level of play, and I don't really enjoy programming with the obsession of being number one.
I enjoy much more trying to help the practical player improve, in particular, to penetrate more into the mysteries of this game.
If the Stockfish people have fun like that, I respect them, but, to me, scientific research is meaningless if it doesn't make people's lives better.
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO using just 40 such positions is not enough, maybe something like 10,000 positions will be more accurate
Your opinion is fine, I wish you success in your exploits, however I think it shouldn't affect SF (the parent project)
af110e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but it is already something since the positions are from human experience and we believe that technology should help people.
In fact, we have developed a batch application that extracts from a database of very high quality matches all those in which Shashchess provides all different moves at the 4 different elo bands + the nnue.
There are so many of them and we had planned to categorize them by phase of the game (opening, transition to midgame, midgame, transition to ending, and ending): 40 for each for a total of 200.
By submitting them to players of different elo we will have feedback.
This field of research is really imho much more fascinating and of practical use. The goal is to "understand chess".