-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train not only from startpos #787
Comments
It is impossible to really know the consequences of all the options this opens to, but regarding 6), I know many worry about Leela's endgame play, and I will just say that one thing comes to mind: The recent ELF OpenGo paper notes that they used a very aggressive (IMHO) resign rate of 5% to not only improve game generation, but force the NN to learn more about the opening and middlegame, which they deemed more important. I agree with this assessment as a general rule. You do want to convert winning endgames, but you need to achieve those first, and that is almost always a result of the previous game stages. Am I suggesting this idea is bad? not at all, but I just warn against too high expectations of this. One variant of this might be of the next idea of generating prior moves. When training from a non-starting position, the lack of history is a concern. Would a fake history affect the training negatively? Adding prior moves might mitigate this concern. |
What are your thoughts on 342 as a solution to this? ie randomly chosen n, 1 node policy based search for n moves chooses start posting. |
Note it's not only about openings (e.g. so that it performs better at TCEC and is useful for typical midgame analysis), but also about fixing blind spots. |
Note that we have LR drops during training, it's not trivial to have LR and changing startpos set make go together. |
So far the idea I like the most is to have a form on website where everyone can submit startpos they want (position before the blunder they saw, or positions they heard to be used at TCEC).
In order not to overfit on endgame positions, probably it makes sense to scale frequency to number of pieces remaining or something like that. |
I mentioned this in #342, but we could use experience replay, which is a common ML technique. Simply replay/retrain on previously played positions. There's also the concept of importance sampling in ER, which means you score a training position and return to the ones with high (or low) scores more often. For example, score based on KLD between NN policy and search policy in a position, which would result in retraining on that position if the KLD is large. Could also do something similar using NN value and result Z or simply calculate the loss function and use that as the score. |
An idea that I like is to rely on the subset of positions from chess960 which conform to standard castling rules. I.e. the chess960 positions with Ra1, Rh1 and Ke1. There are 18 of these. Pros:
Cons:
|
There seem to be a lot of different ideas to 'start not only from startpos'. It seems going for a small opening book in a training run validates an issue registration in its own right. Although non-zero: a small opening book (as @mooskagh already build in the client code originally) can still just be an option (startup-parameter in lc0). As I understand it, this opening book should now reside in the lc0-engine for a certain training run as it is now doing selfplays within the engine instead of the client directing that, correct? What I'm wondering about is the general design? That is: what would tell the clients to use something different than startpos to begin with? In other words; the engine does selfplay. The client kicks that off on the local machine. How does the server that collects all the training games direct/control that 'non startposition' can be used or not used by the clients? |
That's more a topic of #541, but yes, both clients and server have to be updated too. |
I think the possibility of starting training not from starting position was included; does this issue contain anything which is still relevant and not included into |
(It is about position selection for training, not about implementation of this feature in engine, it's discussed at #541)
It's a long discussed topic but it seems we don't have github issue for yet.
AlphaZero had a goal (for some definition of goal) of winning in chess, and as according to chess rules, it starts from startpos. That's why (and also for purity reasons) AlphaZero trained from startpos.
For us, it would be nice if Lc0 could also analyze less standard positions, and maybe it would be useful to explicitly close some blind spots.
For that we need to have a strategy how to pick positions to use.
Ideally, it should be automated process, which adds variety, has useful positions, but doesn't have external biases to move positions (e.g. it would be nice NOT to train only on positions from human chess, or any non-Leela chess, although I think it's barely avoidable).
So ideas are:
I like this idea (idea number 6, train from endgame) a lot, but I don't know how that could be organized..
One of ideas would be to train a network which predicts previous move from position, and then generate checkmate positions and gradually "undo" move by move. :)
Lots to implement questionable usage.
So, put your ideas here.
The text was updated successfully, but these errors were encountered: