Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A large error induced by compression for se_e3 descriptor #2250

Closed
njzjz opened this issue Jan 13, 2023 Discussed in #2182 · 2 comments · Fixed by #2552
Closed

A large error induced by compression for se_e3 descriptor #2250

njzjz opened this issue Jan 13, 2023 Discussed in #2182 · 2 comments · Fixed by #2552
Labels
bug critical Critical bugs that may break the results without messages reproduced This bug has been reproduced by developers

Comments

@njzjz
Copy link
Member

njzjz commented Jan 13, 2023

Discussed in #2182

Originally posted by shihao-code December 15, 2022
When I used a hybrid descriptor of se_e2_a and se_e3, the RMSE of deep potential is very small (3 meV/atom for energy and 59 meV/Ang for atomic force), however, after compressing the potential, the RMSE change very large (16 meV/atom for energy and 64 meV/Ang for atomic force). But if I only used se_e2_a descriptor with keepind other parameter in input.json file unchanged, there is no change before and after compression. And if only se_e3 descriptor was used, there is also a large error induced by compression.

Verison of deepmd-kit: 2.1.5_cuda11.6

Command I used: dp compress -i FeH.pb -o FeH-compress.pb --step 0.002

The output of compression:

Loading BaseGPU/2021
  Loading requirement: nvhpc/21.3 cuda/11.2 openmpi/4.0.3cu11.2.v2
WARNING:tensorflow:From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS.
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
  _bootstrap._exec(spec, module)
DEEPMD INFO    


DEEPMD INFO    stage 1: compress the model
DEEPMD INFO     _____               _____   __  __  _____           _     _  _   
DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |  
DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_ 
DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_ 
DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
DEEPMD INFO    Please read and cite:
DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO    installed to:         /home/conda/feedstock_root/build_artifacts/deepmd-kit_1663923590539/work/_skbuild/linux-x86_64-3.10/cmake-install
DEEPMD INFO    source :              v2.1.5
DEEPMD INFO    source brach:         HEAD
DEEPMD INFO    source commit:        6e3d4a62
DEEPMD INFO    source commit at:     2022-09-23 16:10:28 +0800
DEEPMD INFO    build float prec:     double
DEEPMD INFO    build variant:        cuda
DEEPMD INFO    build with tf inc:    /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/include;/sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/include
DEEPMD INFO    build with tf lib:    
DEEPMD INFO    ---Summary of the training---------------------------------------
DEEPMD INFO    running on:           gpu0501
DEEPMD INFO    computing device:     gpu:0
DEEPMD INFO    CUDA_VISIBLE_DEVICES: 0,1
DEEPMD INFO    Count of visible GPU: 2
DEEPMD INFO    num_intra_threads:    0
DEEPMD INFO    num_inter_threads:    0
DEEPMD INFO    -----------------------------------------------------------------
DEEPMD INFO    training without frame parameter
DEEPMD INFO    training data with lower boundary: [-0.22680075 -0.29381635]
DEEPMD INFO    training data with upper boundary: [30.16753829 41.82551879]
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 1 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 1 core/socket x 1 thread/core (1 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 
OMP: Info #254: KMP_AFFINITY: pid 449229 tid 449422 thread 0 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 449229 tid 449421 thread 1 bound to OS proc set 0
DEEPMD INFO    training data with lower boundary: [-1505.35165116 -4165.88651941]
DEEPMD INFO    training data with upper boundary: [1505.35165116 4165.88651941]
DEEPMD INFO    built lr
DEEPMD INFO    built network
DEEPMD INFO    built training
DEEPMD INFO    initialize model from scratch
INFO:tensorflow:/sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.index
DEEPMD INFO    /sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.index
INFO:tensorflow:0
DEEPMD INFO    0
INFO:tensorflow:/sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.data-00000-of-00001
DEEPMD INFO    /sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.data-00000-of-00001
INFO:tensorflow:69300
DEEPMD INFO    69300
INFO:tensorflow:/sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.meta
DEEPMD INFO    /sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.meta
INFO:tensorflow:1659000
DEEPMD INFO    1659000
DEEPMD INFO    finished compressing
DEEPMD INFO    


DEEPMD INFO    stage 2: freeze the model
INFO:tensorflow:Restoring parameters from model-compression/model.ckpt
DEEPMD INFO    Restoring parameters from model-compression/model.ckpt
DEEPMD INFO    The following nodes will be frozen: ['model_type', 'descrpt_attr/rcut', 'descrpt_attr/ntypes', 'model_attr/tmap', 'model_attr/model_type', 'model_attr/model_version', 'train_attr/min_nbor_dist', 'train_attr/training_script', 'o_energy', 'o_force', 'o_virial', 'o_atom_energy', 'o_atom_virial', 'fitting_attr/dfparam', 'fitting_attr/daparam']
WARNING:tensorflow:From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:246: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
DEEPMD WARNING From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:246: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
DEEPMD WARNING From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
DEEPMD INFO    1258 ops in the final graph.

My input.json file

        "descriptor": {
	    "type": "hybrid",
	    "list": [
		 {
	            "type": "se_e2_a",
	            "sel": "auto",
	            "rcut_smth": 0.5,
		    "activation_function": "tanh",
	            "rcut": 6.5,
	            "neuron": [
	                30,
	                60,
			120
	            ],
	            "resnet_dt": false,
	            "axis_neuron": 32,
	            "seed": 13290,
	            "_comment": " that's all"
		 },
		 {
                    "type": "se_e3",
                    "sel": "auto",
                    "rcut_smth": 0.5,
                    "activation_function": "tanh",
                    "rcut": 5.0,
                    "neuron": [
                        5,
                        10,
                        20
                    ],
                    "resnet_dt": false,
                    "seed": 1327,
                    "_comment": " that's all"
		 }
	    ]
        },
        "fitting_net": {
            "neuron": [
                320,
                320,
		320
            ],
            "resnet_dt": true,
            "seed": 6374,
            "_comment": " that's all"
        },
@njzjz njzjz added bug reproduced This bug has been reproduced by developers labels Jan 13, 2023
@DingChangjie
Copy link

Hi, I've also found this issue in my hybrid-descriptor ZrC potential , where I find that the accuracy deteriotates severely after model compression. I used the latest v2.2.1 version of deepmd-kit. It seems that this issue has not yet been fixed...?

njzjz added a commit to njzjz/deepmd-kit that referenced this issue May 22, 2023
@njzjz njzjz linked a pull request May 22, 2023 that will close this issue
@njzjz njzjz moved this from Todo to Done in Bugfixes for DeePMD-kit May 22, 2023
wanghan-iapcm pushed a commit that referenced this issue May 22, 2023
Fix #2250.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
@njzjz
Copy link
Member Author

njzjz commented May 22, 2023

Fixed in #2552.

@njzjz njzjz closed this as completed May 22, 2023
@njzjz njzjz added the critical Critical bugs that may break the results without messages label Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug critical Critical bugs that may break the results without messages reproduced This bug has been reproduced by developers
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants