Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not read weights.pb files #8

Open
Muffty opened this issue Mar 1, 2019 · 11 comments
Open

Can not read weights.pb files #8

Muffty opened this issue Mar 1, 2019 · 11 comments

Comments

@Muffty
Copy link

Muffty commented Mar 1, 2019

Hi, I try to use the downloaded weights file (I got using download_latest_network.py).
It is not .txt.gz but .pb.gz and read_weights_file does not seem to be okay with that.

@advait
Copy link

advait commented Apr 24, 2019

@so-much-meta can you advise here? Very eager to continue progress on this project, but this seems to be a hard blocker. Happy to help update any code if you can point us in the right direction. Thanks!

@so-much-meta
Copy link
Owner

so-much-meta commented Apr 24, 2019 via email

@Sumegh-git
Copy link

Sumegh-git commented Jun 10, 2020

@so-much-meta could you provide some help with how to work with the .pb files which are currently available in LC0 page ?

I tried renaming that to .txt but then it throws a size mismatch error -

line 153, in from_weights_file
param.data.copy_(w.view_as(param))
RuntimeError: shape '[8, 112, 3, 3]' is invalid for input of size 8640

@ZararB
Copy link

ZararB commented Jan 16, 2021

You can convert the pb.gz to a txt.gz file using the save_txt function of the net class but this won't fix the issue because the nets on the website are a newer version and aren't compatible. You can try to make it compatible but I found it easier to do this:

Download the model and config yaml files from here :

https://www.comp.nus.edu.sg/~sergio-v/new/128x10-t60-2/

Load the model using the tfprocess module

`
import yaml
import tfprocess

class KerasNet:

def __init__(self, model_file="128x10-t60-2-5300.pb.gz", cfg_file="configs/128x10-t60-2.yaml"):

    with open(cfg_file, "rb") as f:
        cfg = f.read()

    cfg = yaml.safe_load(cfg)
    print(yaml.dump(cfg, default_flow_style=False))

    tfp = tfprocess.TFProcess(cfg)
    tfp.init_net_v2()
    tfp.replace_weights_v2(model_file)

    self.model = tfp.model 

def evaluate(self, leela_board):

    input_planes = leela_board.lcz_features()
    model_input = input_planes.reshape(1, 112, 64)
    policy, value, _ = self.model.predict(model_input)

    return policy, value

`

@lzanini
Copy link

lzanini commented Jan 25, 2021

@ZararB Thanks, this was very helpful!

Do you happen to have a solution to read current training data? Using TarTrainingFile yield following error

Exception: Only version 3 of training data supported

@ZararB
Copy link

ZararB commented Jan 26, 2021

@lzanini Sorry, I am only using the network for evaluation and haven't looked into training

@lzanini
Copy link

lzanini commented Jan 26, 2021

@ZararB thank you for your answer.

Did you manage to get correct evaluations using lczero-training network with inputs computed by this library?

I get the network to run, but output values are surprising to say the least, and very different from the ones I get with the engine limited to nodes=1. Which makes me wonder if the inputs are still valid for the latest networks.

@ZararB
Copy link

ZararB commented Jan 26, 2021

@lzanini yeah the code I wrote before was a bit misleading. The policy network output needs to be run through the softmax function to get probabilities. Here is the full thing:

`
import numpy as np
import yaml
import tfprocess
from collections import OrderedDict

class KerasNet:

def __init__(self, model_file="128x10-t60-2-5300.pb.gz", cfg_file="configs/example.yaml"):

    with open(cfg_file, "rb") as f:
        cfg = f.read()

    cfg = yaml.safe_load(cfg)
    print(yaml.dump(cfg, default_flow_style=False))

    tfp = tfprocess.TFProcess(cfg, gpu=True)
    tfp.init_net_v2()
    tfp.replace_weights_v2(model_file)
    self.model = tfp.model 

def _softmax(self, x, softmax_temp=1.0):
    e_x = np.exp((x - np.max(x))/softmax_temp)
    return e_x / e_x.sum(axis=0)

def _evaluate(self, leela_board):

    input_planes = leela_board.lcz_features()
    model_input = input_planes.reshape(1, 112, 64)
    model_output = self.model.predict(model_input)

    policy_logits = model_output[0][0]
    
    legal_uci = [m.uci() for m in leela_board.generate_legal_moves()]
    
    if legal_uci:
        legal_indexes = leela_board.lcz_uci_to_idx(legal_uci)
        softmaxed = self._softmax(policy_logits[legal_indexes])
        softmaxed_aspython = map(float, softmaxed)
        policy_legal = OrderedDict(sorted(zip(legal_uci, softmaxed_aspython),
                                    key = lambda mp: (mp[1], mp[0]),
                                    reverse=True))

    else:
        policy_legal = OrderedDict()

    return policy_legal

`

I copied most of this from the lcztools code and just modified it a little so it worked with mine

@lzanini
Copy link

lzanini commented Jan 27, 2021

@ZararB Where does the term x - np.max(x) come from in your softmax definition?

The usual definition (and the one defined in tensorflow) is simply

softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)

Since the training script uses the standard tf.nn.softmax_cross_entropy_with_logits directly on the model output (here and here), I don't see why the values need to be normalized before going through the softmax

@ZararB
Copy link

ZararB commented Jan 27, 2021

@lzanini Both definitions are equivalent. The max is usually subtracted from all the logits so that we can work with smaller numbers and to prevent NaN issues. You can use either one

@lzanini
Copy link

lzanini commented Jan 27, 2021

You're right! Thanks again 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants