Home

Welcome to the LMPlay wiki!

This is mainly here to share results and more details on various experiments. Here are the main graphs. They show all the currently running experiments in one place:

Log plots

Loss

log_loss

Accuracy

log_accuracy

Diff plots

These are 'diff' plots where all runs are compared to one run. In this case the 'GPT2ish' 12 layer model.

Loss

log_diff_loss

Accuracy

log_diff_accuracy

The experiment name gives some useful information. It is broken into <name>_<layers>L_<context size>_<training set>. Sometimes an experiment name will have a version or other numeric information in it. In case of the 'ue' (Unified Embeddings) experiments the name is broken into ue_<version>_<internal embedding multiple>.

So these plots hopefully make it easy to see that the 16x UE is beating the 8x UE by a large amount but the UEs in general are massively ahead of the GPTish models even though the 8x and 16x UE models have exactly the same parameter count and structure as the GPT2ish 6 layer model. The only difference between a UE 6L and a GPT2ish 6L is the UE training. These graphs show the 16x UE learning nearly 10x faster than the equivalent GPT2ish model and that gap is rapidly increasing. It is looking likely that it will learn not only faster, but much deeper. In fact, it is looking like it may end up learning much faster and deeper than even the 12 layer GPT2 ish model despite having half the layers. Much longer training runs are required to really prove this out, but the current results are encouraging so far.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Welcome to the LMPlay wiki!

Log plots

Loss

Accuracy

Diff plots

Loss

Accuracy

Clone this wiki locally