Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train not only from startpos #787

Closed
mooskagh opened this issue Mar 8, 2019 · 11 comments
Closed

Train not only from startpos #787

mooskagh opened this issue Mar 8, 2019 · 11 comments
Labels
enhancement New feature or request

Comments

@mooskagh
Copy link
Member

mooskagh commented Mar 8, 2019

(It is about position selection for training, not about implementation of this feature in engine, it's discussed at #541)

It's a long discussed topic but it seems we don't have github issue for yet.

AlphaZero had a goal (for some definition of goal) of winning in chess, and as according to chess rules, it starts from startpos. That's why (and also for purity reasons) AlphaZero trained from startpos.
For us, it would be nice if Lc0 could also analyze less standard positions, and maybe it would be useful to explicitly close some blind spots.

For that we need to have a strategy how to pick positions to use.

Ideally, it should be automated process, which adds variety, has useful positions, but doesn't have external biases to move positions (e.g. it would be nice NOT to train only on positions from human chess, or any non-Leela chess, although I think it's barely avoidable).

So ideas are:

  1. 1-2 ply opening book with all possible moves forced.
  2. Opening book extracted from external games (maybe lower level FICS/lichess for variety even)
  3. Generate startpos from previous Leela games (e.g. when non-highest temp is used, see Separate exploration from training feedback #720), or when rare moves are possible (e.g. capture promotions)
  4. Train from Chess960 positions (Support Chess960 #786).
  5. Completely random positions? :)
  6. Also it's often said that it would be nice to start training from endgame, and then gradually move to opening.

I like this idea (idea number 6, train from endgame) a lot, but I don't know how that could be organized..

  1. Starting from endgame positions?.. From real games (external bias) or random positions (lots of useless ones)?
  2. Moving to midgames?.. What to do now?

One of ideas would be to train a network which predicts previous move from position, and then generate checkmate positions and gradually "undo" move by move. :)
Lots to implement questionable usage.

So, put your ideas here.

@mooskagh mooskagh added the enhancement New feature or request label Mar 8, 2019
@ASilver
Copy link

ASilver commented Mar 8, 2019

It is impossible to really know the consequences of all the options this opens to, but regarding 6), I know many worry about Leela's endgame play, and I will just say that one thing comes to mind:

The recent ELF OpenGo paper notes that they used a very aggressive (IMHO) resign rate of 5% to not only improve game generation, but force the NN to learn more about the opening and middlegame, which they deemed more important. I agree with this assessment as a general rule. You do want to convert winning endgames, but you need to achieve those first, and that is almost always a result of the previous game stages.

Am I suggesting this idea is bad? not at all, but I just warn against too high expectations of this.

One variant of this might be of the next idea of generating prior moves. When training from a non-starting position, the lack of history is a concern. Would a fake history affect the training negatively? Adding prior moves might mitigate this concern.

@oscardssmith
Copy link
Contributor

What are your thoughts on 342 as a solution to this? ie randomly chosen n, 1 node policy based search for n moves chooses start posting.

@mooskagh
Copy link
Member Author

mooskagh commented Mar 8, 2019

Note it's not only about openings (e.g. so that it performs better at TCEC and is useful for typical midgame analysis), but also about fixing blind spots.
E.g. we need to somehow also increase presense of "2 queens", "capture promotion", etc positions, and other rare but important positions we don't know about (how to find them).

@brianprichardson
Copy link

#7 as an option is to simply skip the first N plies when training. As a net gets more mature, increase N.
Of course, all of the games would start from the startpos, so this would be best together with #1 (a 1-2 move book).

@mooskagh
Copy link
Member Author

mooskagh commented Mar 8, 2019

Note that we have LR drops during training, it's not trivial to have LR and changing startpos set make go together.

@mooskagh
Copy link
Member Author

mooskagh commented Mar 8, 2019

So far the idea I like the most is to have a form on website where everyone can submit startpos they want (position before the blunder they saw, or positions they heard to be used at TCEC).

  1. XX% of training games are generated from startpos.
  2. (100-XX)% of training games are generated from those submitted positions.

In order not to overfit on endgame positions, probably it makes sense to scale frequency to number of pieces remaining or something like that.

@RedDenver
Copy link

I mentioned this in #342, but we could use experience replay, which is a common ML technique. Simply replay/retrain on previously played positions. There's also the concept of importance sampling in ER, which means you score a training position and return to the ones with high (or low) scores more often. For example, score based on KLD between NN policy and search policy in a position, which would result in retraining on that position if the KLD is large. Could also do something similar using NN value and result Z or simply calculate the loss function and use that as the score.

@rosenthj
Copy link

An idea that I like is to rely on the subset of positions from chess960 which conform to standard castling rules. I.e. the chess960 positions with Ra1, Rh1 and Ke1. There are 18 of these.
If we further relax the constraint that the start position has to be symmetric (not the case for the other suggestions here) and instead have each side select independently from these starting positions, we get 18^2=324 starting positions.

Pros:

  • Position is from same class of positions as regular starting position in the following sense:
    • Pawn structure is still completely open and yet to be defined.
    • Castling is still possible in either direction.
    • Pieces are still completely undeveloped.
  • Contrary to chess960, Leela could play from these positions already.
    • Castling follows the same rules and patterns as in regular chess.
  • Trivial to implement.
  • Patterns which are not frequent in chess in regular (and especially superhuman) play, but could occur with reasonable play should naturally be seen more frequently. Eg a Bishop or Knight on d1. I feel this is especially true with regards to pawn structures.
  • Positions are (arguably) fairly unbiased by human opinions.

Cons:

  • This does nothing to increase lategame pattern frequency.
  • We might be sampling from some unusual patterns too much, which might damage regular play from the starting position.
  • I feel it's a bit artificial in a sense. This is not a set of positions the general population relates to directly.

@MelleKoning
Copy link
Contributor

MelleKoning commented Mar 14, 2019

There seem to be a lot of different ideas to 'start not only from startpos'.

It seems going for a small opening book in a training run validates an issue registration in its own right.

Although non-zero: a small opening book (as @mooskagh already build in the client code originally) can still just be an option (startup-parameter in lc0). As I understand it, this opening book should now reside in the lc0-engine for a certain training run as it is now doing selfplays within the engine instead of the client directing that, correct?

What I'm wondering about is the general design? That is: what would tell the clients to use something different than startpos to begin with? In other words; the engine does selfplay. The client kicks that off on the local machine. How does the server that collects all the training games direct/control that 'non startposition' can be used or not used by the clients?
I'm assuming this is important before implementation can start? For me this is not clear, so if somebody can shed some light on this?

@mooskagh
Copy link
Member Author

That's more a topic of #541, but yes, both clients and server have to be updated too.
Clients should download list of openings, and server should somehow tell what do download.
Also it's a question whether it should be a static file on a server or e.g. something generated dynamically from database based on desired frequencies.
But in any case, implementing it in engine is the first step as it's not blocked by anything else.

@Naphthalin
Copy link
Contributor

I think the possibility of starting training not from starting position was included; does this issue contain anything which is still relevant and not included into master or can it be closed?

@mooskagh mooskagh closed this as completed May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants