You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
And later I tested by "dp -h" command and the output seems that the deepMD was installed correctly:
2022-01-24 20:17:04.096599: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd/lib/python3.9/importlib/init.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
_bootstrap._exec(spec, module)
usage: dp [-h] [--version] {config,transfer,train,freeze,test,compress,doc-train-input,model-devi,convert-from} ...
DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
Valid subcommands:
{config,transfer,train,freeze,test,compress,doc-train-input,model-devi,convert-from}
config fast configuration of parameter file for smooth model
transfer pass parameters to another model
train train a model
freeze freeze the model
test test the model
compress compress a model
doc-train-input print the documentation (in rst format) of input training parameters.
model-devi calculate model deviation
convert-from convert lower model version to supported version
But during the use of deepMD, I tested the official water example by using "dp train water.json", but unluckily I got the below result:
2022-01-24 19:49:52.461464: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/importlib/init.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
_bootstrap._exec(spec, module)
/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/common.py:334: UserWarning: the key n_neuron is deprecated, please use fitting_neuron instead
warnings.warn(f"the key {ii} is deprecated, please use {key} instead")
/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/utils/compat.py:50: UserWarning: It seems that you are using a deepmd-kit input of version 0.x.x, which is deprecated. we have converted the input to >2.0.0 compatible
warnings.warn(msg)
2022-01-24 19:50:04.562682: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-24 19:50:04.566883: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-01-24 19:50:04.881041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:b7:00.0 name: Tesla V100-SXM3-32GB computeCapability: 7.0
coreClock: 1.597GHz coreCount: 80 deviceMemorySize: 31.75GiB deviceMemoryBandwidth: 913.62GiB/s
2022-01-24 19:50:04.881215: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.1
2022-01-24 19:50:04.888855: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2022-01-24 19:50:04.888972: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
2022-01-24 19:50:04.894870: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-01-24 19:50:04.896524: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-01-24 19:50:04.901345: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2022-01-24 19:50:04.903826: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
2022-01-24 19:50:04.925130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.7
2022-01-24 19:50:04.937177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-01-24 19:50:04.937262: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.1
2022-01-24 19:50:07.387275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-01-24 19:50:07.387406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2022-01-24 19:50:07.387430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2022-01-24 19:50:07.416026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9774 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM3-32GB, pci bus id: 0000:b7:00.0, compute capability: 7.0)
2022-01-24 19:50:07.416709: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-01-24 19:50:07.433709: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2700000000 Hz
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 24,25,30,72,73,78
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 6 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 3 cores/socket x 2 threads/core (3 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 24 maps to socket 1 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 72 maps to socket 1 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 25 maps to socket 1 core 1 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 73 maps to socket 1 core 1 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 30 maps to socket 1 core 8 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 78 maps to socket 1 core 8 thread 1
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248784 thread 1 bound to OS proc set 25
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248787 thread 2 bound to OS proc set 30
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248788 thread 3 bound to OS proc set 72
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248789 thread 4 bound to OS proc set 73
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248790 thread 5 bound to OS proc set 78
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248791 thread 6 bound to OS proc set 24
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248785 thread 7 bound to OS proc set 25
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248792 thread 8 bound to OS proc set 30
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248793 thread 9 bound to OS proc set 72
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248794 thread 10 bound to OS proc set 73
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248795 thread 11 bound to OS proc set 78
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248796 thread 12 bound to OS proc set 24
DEEPMD INFO training data with min nbor dist: 0.8763010118574123
DEEPMD INFO training data with max nbor size: [38, 72]
Traceback (most recent call last):
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/bin/dp", line 10, in
sys.exit(main())
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 437, in main
train_dp(**dict_args)
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 91, in train
jdata = update_sel(jdata)
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 341, in update_sel
descrpt_data = update_one_sel(jdata, descrpt_data)
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 318, in update_one_sel
if parse_auto_sel(descriptor['sel']) :
KeyError: 'sel'
Because I didn't change anything after the installation, and I also tried to install some other versions by changing the specifications in the conda install command, still the same error showed, so can you give me some suggestion about how to solve it?
I appreciate a lot for you time!
The text was updated successfully, but these errors were encountered:
Dear DeepMD developers,
I installed deepMD in server by the method provided by the easy-install method provided by the deepMD official account https://github.com/deepmodeling/deepmd-kit/blob/master/doc/install/easy-install.md#with-conda, the command I was using is listed as below:
conda create -n deepmd_tst deepmd-kit=2.0.0=*gpu libdeepmd=2.0.0=*gpu lammps-dp cudatoolkit=10.1 horovod -c https://conda.deepmodeling.org
And later I tested by "dp -h" command and the output seems that the deepMD was installed correctly:
2022-01-24 20:17:04.096599: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd/lib/python3.9/importlib/init.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
_bootstrap._exec(spec, module)
usage: dp [-h] [--version] {config,transfer,train,freeze,test,compress,doc-train-input,model-devi,convert-from} ...
DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
Valid subcommands:
{config,transfer,train,freeze,test,compress,doc-train-input,model-devi,convert-from}
config fast configuration of parameter file for smooth model
transfer pass parameters to another model
train train a model
freeze freeze the model
test test the model
compress compress a model
doc-train-input print the documentation (in rst format) of input training parameters.
model-devi calculate model deviation
convert-from convert lower model version to supported version
But during the use of deepMD, I tested the official water example by using "dp train water.json", but unluckily I got the below result:
2022-01-24 19:49:52.461464: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/importlib/init.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
_bootstrap._exec(spec, module)
/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/common.py:334: UserWarning: the key n_neuron is deprecated, please use fitting_neuron instead
warnings.warn(f"the key {ii} is deprecated, please use {key} instead")
/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/utils/compat.py:50: UserWarning: It seems that you are using a deepmd-kit input of version 0.x.x, which is deprecated. we have converted the input to >2.0.0 compatible
warnings.warn(msg)
2022-01-24 19:50:04.562682: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-24 19:50:04.566883: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-01-24 19:50:04.881041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:b7:00.0 name: Tesla V100-SXM3-32GB computeCapability: 7.0
coreClock: 1.597GHz coreCount: 80 deviceMemorySize: 31.75GiB deviceMemoryBandwidth: 913.62GiB/s
2022-01-24 19:50:04.881215: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.1
2022-01-24 19:50:04.888855: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2022-01-24 19:50:04.888972: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
2022-01-24 19:50:04.894870: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-01-24 19:50:04.896524: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-01-24 19:50:04.901345: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2022-01-24 19:50:04.903826: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
2022-01-24 19:50:04.925130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.7
2022-01-24 19:50:04.937177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-01-24 19:50:04.937262: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.1
2022-01-24 19:50:07.387275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-01-24 19:50:07.387406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2022-01-24 19:50:07.387430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2022-01-24 19:50:07.416026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9774 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM3-32GB, pci bus id: 0000:b7:00.0, compute capability: 7.0)
2022-01-24 19:50:07.416709: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-01-24 19:50:07.433709: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2700000000 Hz
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 24,25,30,72,73,78
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 6 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 3 cores/socket x 2 threads/core (3 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 24 maps to socket 1 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 72 maps to socket 1 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 25 maps to socket 1 core 1 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 73 maps to socket 1 core 1 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 30 maps to socket 1 core 8 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 78 maps to socket 1 core 8 thread 1
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248784 thread 1 bound to OS proc set 25
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248787 thread 2 bound to OS proc set 30
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248788 thread 3 bound to OS proc set 72
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248789 thread 4 bound to OS proc set 73
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248790 thread 5 bound to OS proc set 78
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248791 thread 6 bound to OS proc set 24
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248785 thread 7 bound to OS proc set 25
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248792 thread 8 bound to OS proc set 30
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248793 thread 9 bound to OS proc set 72
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248794 thread 10 bound to OS proc set 73
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248795 thread 11 bound to OS proc set 78
OMP: Info #254: KMP_AFFINITY: pid 248434 tid 248796 thread 12 bound to OS proc set 24
DEEPMD INFO training data with min nbor dist: 0.8763010118574123
DEEPMD INFO training data with max nbor size: [38, 72]
Traceback (most recent call last):
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/bin/dp", line 10, in
sys.exit(main())
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 437, in main
train_dp(**dict_args)
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 91, in train
jdata = update_sel(jdata)
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 341, in update_sel
descrpt_data = update_one_sel(jdata, descrpt_data)
File "/lustre/home/acct-msekmr/msekmr/anaconda3/envs/deepmd_tst/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 318, in update_one_sel
if parse_auto_sel(descriptor['sel']) :
KeyError: 'sel'
Because I didn't change anything after the installation, and I also tried to install some other versions by changing the specifications in the conda install command, still the same error showed, so can you give me some suggestion about how to solve it?
I appreciate a lot for you time!
The text was updated successfully, but these errors were encountered: