Train not only from startpos #787

mooskagh · 2019-03-08T15:08:33Z

(It is about position selection for training, not about implementation of this feature in engine, it's discussed at #541)

It's a long discussed topic but it seems we don't have github issue for yet.

AlphaZero had a goal (for some definition of goal) of winning in chess, and as according to chess rules, it starts from startpos. That's why (and also for purity reasons) AlphaZero trained from startpos.
For us, it would be nice if Lc0 could also analyze less standard positions, and maybe it would be useful to explicitly close some blind spots.

For that we need to have a strategy how to pick positions to use.

Ideally, it should be automated process, which adds variety, has useful positions, but doesn't have external biases to move positions (e.g. it would be nice NOT to train only on positions from human chess, or any non-Leela chess, although I think it's barely avoidable).

So ideas are:

1-2 ply opening book with all possible moves forced.
Opening book extracted from external games (maybe lower level FICS/lichess for variety even)
Generate startpos from previous Leela games (e.g. when non-highest temp is used, see Separate exploration from training feedback #720), or when rare moves are possible (e.g. capture promotions)
Train from Chess960 positions (Support Chess960 #786).
Completely random positions? :)
Also it's often said that it would be nice to start training from endgame, and then gradually move to opening.

I like this idea (idea number 6, train from endgame) a lot, but I don't know how that could be organized..

Starting from endgame positions?.. From real games (external bias) or random positions (lots of useless ones)?
Moving to midgames?.. What to do now?

One of ideas would be to train a network which predicts previous move from position, and then generate checkmate positions and gradually "undo" move by move. :)
Lots to implement questionable usage.

So, put your ideas here.

ASilver · 2019-03-08T15:39:35Z

It is impossible to really know the consequences of all the options this opens to, but regarding 6), I know many worry about Leela's endgame play, and I will just say that one thing comes to mind:

The recent ELF OpenGo paper notes that they used a very aggressive (IMHO) resign rate of 5% to not only improve game generation, but force the NN to learn more about the opening and middlegame, which they deemed more important. I agree with this assessment as a general rule. You do want to convert winning endgames, but you need to achieve those first, and that is almost always a result of the previous game stages.

Am I suggesting this idea is bad? not at all, but I just warn against too high expectations of this.

One variant of this might be of the next idea of generating prior moves. When training from a non-starting position, the lack of history is a concern. Would a fake history affect the training negatively? Adding prior moves might mitigate this concern.

oscardssmith · 2019-03-08T15:41:04Z

What are your thoughts on 342 as a solution to this? ie randomly chosen n, 1 node policy based search for n moves chooses start posting.

mooskagh · 2019-03-08T15:49:37Z

Note it's not only about openings (e.g. so that it performs better at TCEC and is useful for typical midgame analysis), but also about fixing blind spots.
E.g. we need to somehow also increase presense of "2 queens", "capture promotion", etc positions, and other rare but important positions we don't know about (how to find them).

brianprichardson · 2019-03-08T16:00:31Z

#7 as an option is to simply skip the first N plies when training. As a net gets more mature, increase N.
Of course, all of the games would start from the startpos, so this would be best together with #1 (a 1-2 move book).

mooskagh · 2019-03-08T16:04:04Z

Note that we have LR drops during training, it's not trivial to have LR and changing startpos set make go together.

mooskagh · 2019-03-08T16:23:59Z

So far the idea I like the most is to have a form on website where everyone can submit startpos they want (position before the blunder they saw, or positions they heard to be used at TCEC).

XX% of training games are generated from startpos.
(100-XX)% of training games are generated from those submitted positions.

In order not to overfit on endgame positions, probably it makes sense to scale frequency to number of pieces remaining or something like that.

RedDenver · 2019-03-08T21:32:42Z

I mentioned this in #342, but we could use experience replay, which is a common ML technique. Simply replay/retrain on previously played positions. There's also the concept of importance sampling in ER, which means you score a training position and return to the ones with high (or low) scores more often. For example, score based on KLD between NN policy and search policy in a position, which would result in retraining on that position if the KLD is large. Could also do something similar using NN value and result Z or simply calculate the loss function and use that as the score.

rosenthj · 2019-03-13T18:26:52Z

An idea that I like is to rely on the subset of positions from chess960 which conform to standard castling rules. I.e. the chess960 positions with Ra1, Rh1 and Ke1. There are 18 of these.
If we further relax the constraint that the start position has to be symmetric (not the case for the other suggestions here) and instead have each side select independently from these starting positions, we get 18^2=324 starting positions.

Pros:

Position is from same class of positions as regular starting position in the following sense:
- Pawn structure is still completely open and yet to be defined.
- Castling is still possible in either direction.
- Pieces are still completely undeveloped.
Contrary to chess960, Leela could play from these positions already.
- Castling follows the same rules and patterns as in regular chess.
Trivial to implement.
Patterns which are not frequent in chess in regular (and especially superhuman) play, but could occur with reasonable play should naturally be seen more frequently. Eg a Bishop or Knight on d1. I feel this is especially true with regards to pawn structures.
Positions are (arguably) fairly unbiased by human opinions.

Cons:

This does nothing to increase lategame pattern frequency.
We might be sampling from some unusual patterns too much, which might damage regular play from the starting position.
I feel it's a bit artificial in a sense. This is not a set of positions the general population relates to directly.

MelleKoning · 2019-03-14T20:30:18Z

There seem to be a lot of different ideas to 'start not only from startpos'.

It seems going for a small opening book in a training run validates an issue registration in its own right.

Although non-zero: a small opening book (as @mooskagh already build in the client code originally) can still just be an option (startup-parameter in lc0). As I understand it, this opening book should now reside in the lc0-engine for a certain training run as it is now doing selfplays within the engine instead of the client directing that, correct?

What I'm wondering about is the general design? That is: what would tell the clients to use something different than startpos to begin with? In other words; the engine does selfplay. The client kicks that off on the local machine. How does the server that collects all the training games direct/control that 'non startposition' can be used or not used by the clients?
I'm assuming this is important before implementation can start? For me this is not clear, so if somebody can shed some light on this?

mooskagh · 2019-03-15T22:49:23Z

That's more a topic of #541, but yes, both clients and server have to be updated too.
Clients should download list of openings, and server should somehow tell what do download.
Also it's a question whether it should be a static file on a server or e.g. something generated dynamically from database based on desired frequencies.
But in any case, implementing it in engine is the first step as it's not blocked by anything else.

Naphthalin · 2020-04-30T15:20:16Z

I think the possibility of starting training not from starting position was included; does this issue contain anything which is still relevant and not included into master or can it be closed?

mooskagh added the enhancement New feature or request label Mar 8, 2019

mooskagh closed this as completed May 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train not only from startpos #787

Train not only from startpos #787

mooskagh commented Mar 8, 2019

ASilver commented Mar 8, 2019 •

edited

Loading

oscardssmith commented Mar 8, 2019

mooskagh commented Mar 8, 2019

brianprichardson commented Mar 8, 2019

mooskagh commented Mar 8, 2019

mooskagh commented Mar 8, 2019

RedDenver commented Mar 8, 2019

rosenthj commented Mar 13, 2019

MelleKoning commented Mar 14, 2019 •

edited

Loading

mooskagh commented Mar 15, 2019

Naphthalin commented Apr 30, 2020

Train not only from startpos #787

Train not only from startpos #787

Comments

mooskagh commented Mar 8, 2019

ASilver commented Mar 8, 2019 • edited Loading

oscardssmith commented Mar 8, 2019

mooskagh commented Mar 8, 2019

brianprichardson commented Mar 8, 2019

mooskagh commented Mar 8, 2019

mooskagh commented Mar 8, 2019

RedDenver commented Mar 8, 2019

rosenthj commented Mar 13, 2019

MelleKoning commented Mar 14, 2019 • edited Loading

mooskagh commented Mar 15, 2019

Naphthalin commented Apr 30, 2020

ASilver commented Mar 8, 2019 •

edited

Loading

MelleKoning commented Mar 14, 2019 •

edited

Loading