-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU error #73
Comments
It has been a while, but for a "self" play run try without any weight file and it should create one to start with. |
Thanks brianprichardson. You mean it never runs with GPU? |
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'input_batchnorm/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,256,1,1], []. |
Depending on the situiation, the weights (.h5) and model (.json) files must match the net architecture in the configs file (typically mini.py). The stronger ones that I uploaded do not match the current config files. IIRC, when running "self" if there are no .h5 and .json files they will be created first. You can add For running "uci" it just tries to read the best files. Other params in the config file can still be set, but most are ignored for uci, like playouts is 1,200 (sort of like fixed number of nodes). The first output you posted shows it is trying to run with the gpu. As slow as it is, it will be far to slow to run without a gpu, and your 1080ti is a very good one. I would try a clean download and just try to run with "uci" and enter the "uci" and "isready" (remember to wait for the readyok), and then "go". You should get a bestmove after some time. If that works, then your packages and gpu are all working ok and we can work from there. What are you trying to do, in general? Self-play training is extremely slow and takes a lot of disk space for the intermediate input plane files. That's why I have a tweaked version that takes pgn input and trains directly from that. |
This issue might be related to #75 and #76 .
What is the command to run this? |
First only do: Then, after it loads enter: |
I get the following error logs when I issue isready
|
I got the same errors: |
See #75 there is a link a fork with a working version. |
Tried to run it with GPU. got the following error. can anyone help me on this?
(Python36) D:\chess\chess-alpha-zero>python src/chess_zero/run.py self
2018-10-10 11:45:55,139@chess_zero.manager INFO # config type: mini
Using TensorFlow backend.
2018-10-10 11:45:59,436@chess_zero.agent.model_chess DEBUG # loading model from D:\chess\chess-alpha-zero\data\model\model_best_config.json
2018-10-10 11:45:59.478648: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-10-10 11:45:59.695745: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2018-10-10 11:45:59.790370: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2018-10-10 11:45:59.795932: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0, 1
2018-10-10 11:48:20.448740: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-10 11:48:20.451530: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0 1
2018-10-10 11:48:20.453788: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N N
2018-10-10 11:48:20.455816: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 1: N N
2018-10-10 11:48:20.458363: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8795 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-10-10 11:48:20.834375: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8795 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1567, in _create_c_op
c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'input_batchnorm/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,256,1,1], [].
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "src/chess_zero/run.py", line 20, in
manager.start()
File "src\chess_zero\manager.py", line 64, in start
return self_play.start(config)
File "src\chess_zero\worker\self_play.py", line 25, in start
return SelfPlayWorker(config).start()
File "src\chess_zero\worker\self_play.py", line 45, in init
self.current_model = self.load_model()
File "src\chess_zero\worker\self_play.py", line 85, in load_model
if self.config.opts.new or not load_best_model_weight(model):
File "src\chess_zero\lib\model_helper.py", line 15, in load_best_model_weight
return model.load(model.config.resource.model_best_config_path, model.config.resource.model_best_weight_path)
File "src\chess_zero\agent\model_chess.py", line 145, in load
self.model = Model.from_config(json.load(f))
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\engine\network.py", line 1032, in from_config
process_node(layer, node_data)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\engine\network.py", line 991, in process_node
layer(unpack_singleton(input_tensors), **kwargs)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\engine\base_layer.py", line 457, in call
output = self.call(inputs, **kwargs)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\layers\normalization.py", line 206, in call
training=training)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 3123, in in_train_phase
x = switch(training, x, alt)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 3058, in switch
else_expression_fn)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2072, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1913, in BuildCondBranch
original_result = fn()
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\layers\normalization.py", line 167, in normalize_inference
epsilon=self.epsilon)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 1908, in batch_normalization
mean = tf.reshape(mean, (-1))
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 6112, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
op_def=op_def)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1734, in init
control_input_ops)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1570, in _create_c_op
raise ValueError(str(e))
ValueError: Shape must be rank 1 but is rank 0 for 'input_batchnorm/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,256,1,1], [].
The text was updated successfully, but these errors were encountered: