Converter bugfixes/improvements #119

drasmuss · 2019-12-19T17:38:57Z

Fixing some issues found in converter for more complex models, and improving performance of converted networks.

drasmuss · 2020-01-16T21:38:48Z

Note for posterity. The transform=None changes caused performance to decrease on the integrator benchmark, but after looking into this for a while I am relatively convinced that this is just a quirk of that particular model rather than a general issue. Basically the removal of some of the unnecessary x*1 ElementwiseInc operators means that one of the remaining ElementwiseInc operators ends up writing to a partial signal block rather than the whole signal block, which is less efficient. But that is just a quirk of how the operators and signals are ordered/merged in that particular model, not an effect we'd expect to see in general. Larger scale tests (e.g. on the Spaun model) show a general speedup with the transform=None changes.

In case it is useful in the future, here is a benchmark script I made to test the cost of different read/write types:

from collections import defaultdict
import timeit

import numpy as np
import tensorflow as tf
from tensorflow.python.eager import profiler
from tensorflow.python.ops.gen_state_ops import (
    TemporaryVariable,
    DestroyTemporaryVariable,
)

tf.config.optimizer.set_experimental_options({"disable_meta_optimizer": True})

minibatch_size = 64
base_shape = (minibatch_size, 16384)
read_write_size = 4096
reps = 1000

with tf.Graph().as_default() as graph:
    results = defaultdict(list)

    idxs = tf.constant(
        np.random.uniform(0, base_shape[1], size=read_write_size), dtype=tf.int32
    )
    idxs_nd = tf.stack(
        tf.meshgrid(tf.range(minibatch_size, dtype=tf.int32), idxs, indexing="ij",),
        axis=-1,
    )

    base = tf.compat.v1.placeholder(shape=base_shape, dtype=tf.float32)

    read_identity = tf.compat.v1.placeholder(
        shape=(minibatch_size, read_write_size), dtype=tf.float32
    )

    for i in range(reps):
        with tf.control_dependencies(results["read_identity"]):
            results["read_identity"] = [read_identity]

        with tf.control_dependencies(results["read_slice"]):
            results["read_slice"] = [
                tf.strided_slice(base, [0, 0], [minibatch_size, read_write_size])
            ]

        with tf.control_dependencies(results["read_gather"]):
            results["read_gather"] = [tf.gather(base, idxs, axis=1)]

        with tf.control_dependencies(results["read_slice_concat"]):
            results["read_slice_concat"] = [
                tf.concat(
                    [
                        tf.strided_slice(
                            base, [0, 0], [minibatch_size, read_write_size // 2]
                        ),
                        tf.strided_slice(
                            base,
                            [0, base_shape[1] - read_write_size // 2],
                            [minibatch_size, base_shape[1]],
                        ),
                    ],
                    axis=1,
                )
            ]

        with tf.control_dependencies(results["write_assign"]):
            results["write_assign"] = [read_identity]

        with tf.control_dependencies(results["write_assign_add"]):
            if i == 0:
                results["write_assign_add"] = [read_identity]
            else:
                results["write_assign_add"] = [
                    results["write_assign_add"][0] + read_identity
                ]

        with tf.control_dependencies(results["write_scatter_add"]):
            results["write_scatter_add"] = [
                tf.tensor_scatter_nd_add(base, idxs_nd, read_identity)
            ]

        with tf.control_dependencies(results["write_scatter_update"]):
            results["write_scatter_update"] = [
                tf.tensor_scatter_nd_update(base, idxs_nd, read_identity)
            ]

        with tf.control_dependencies(results["write_temp_var_add"]):
            var = TemporaryVariable(shape=base.shape, dtype=base.dtype)
            var_name = var.op.name
            var = tf.compat.v1.assign(var, base)
            var = tf.compat.v1.scatter_nd_add(var, idxs_nd, read_identity)
            results["write_temp_var_add"] = [
                DestroyTemporaryVariable(ref=var, var_name=var_name)
            ]

        with tf.control_dependencies(results["write_temp_var_update"]):
            var = TemporaryVariable(shape=base.shape, dtype=base.dtype)
            var_name = var.op.name
            var = tf.compat.v1.assign(var, base)
            var = tf.compat.v1.scatter_nd_update(var, idxs_nd, read_identity)
            results["write_temp_var_update"] = [
                DestroyTemporaryVariable(ref=var, var_name=var_name)
            ]

    # change all the results to the same output, to remove i/o discrepancies
    for k, v in results.items():
        with tf.control_dependencies(v):
            results[k] = tf.constant(1)

with tf.compat.v1.Session(graph=graph) as sess:
    feed_dict = {
        base: np.random.uniform(size=base_shape),
        read_identity: np.random.uniform(size=(minibatch_size, read_write_size)),
    }

    # profiler.start()
    sess.run(results, feed_dict=feed_dict)
    # profiler.save("tmp2_profile", profiler.stop())

    for key, vals in results.items():
        print(key)

        time = 1e10
        for _ in range(50):
            start = timeit.default_timer()
            sess.run(
                vals, feed_dict=feed_dict,
            )
            time = min(time, timeit.default_timer() - start)

        print(time)

hunse · 2020-01-20T20:40:37Z

I'm getting the following warnings in master:

/home/ehunsber/workspace/nengo-dl/nengo_dl/converter.py:1097: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  broadcast_scale[slices] = scale[i]
/home/ehunsber/workspace/nengo-dl/nengo_dl/converter.py:1098: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  broadcast_bias[slices] = bias[i]

Might want to fix these here. I think it's just a matter of using tuple(slices) in these lines.

drasmuss · 2020-01-21T00:35:09Z

That's done in this commit c03bd58, and some other instances in this commit bab3950 (I just stuck them in with a larger commit as they came up since the change was so minor).

tbekolay

This all LGTM! Had a few small questions/comments that could lead to changes but also could be left as is, so I'll wait on your response before proceeding @drasmuss.

nengo_dl/tensor_node.py

nengo_dl/converter.py

CHANGES.rst

Update default CI distribution to xenial and test against bionic. Update copyright dates to 2020.

TensorFlow 2.1 is built with VS 2019.

The axis argument was being ignored.

Store TensorSignal indices as slices instead of full lists of indices. Store initial values of base arrays more efficiently.

Can improve the speed of state updates, and likely doesn't make a significant difference to the memory size (relative to all the other internal state on the GPU).

The is_gpu_available function is deprecated. Use sys.executable in TF GPU check, which ensures that the check is using the same python executable as the source script.

The behaviour of the batch_size parameter was changed slightly.

Some changes were made to the ABR GPU server which sped things up slightly.

tbekolay

All LGTM now, merging when CI finishes.

drasmuss force-pushed the converter2 branch 2 times, most recently from a517739 to 30ef415 Compare December 19, 2019 22:58

drasmuss force-pushed the converter2 branch 7 times, most recently from 00ffd7a to 85f9846 Compare January 10, 2020 14:03

drasmuss force-pushed the converter2 branch from 85f9846 to e9ba0fe Compare January 10, 2020 16:19

drasmuss changed the title ~~WIP Converter bugfixes/improvements~~ Converter bugfixes/improvements Jan 10, 2020

drasmuss force-pushed the converter2 branch 2 times, most recently from 11ceedc to cf275f5 Compare January 16, 2020 18:07

drasmuss force-pushed the converter2 branch 3 times, most recently from a1a3e11 to 0267ba7 Compare January 17, 2020 20:45

drasmuss mentioned this pull request Jan 20, 2020

Add LeakyReLU neuron models #126

Merged

drasmuss force-pushed the converter2 branch from f49a307 to 6576fc3 Compare January 23, 2020 23:23

drasmuss mentioned this pull request Jan 23, 2020

Support for transform=None #128

Merged

tbekolay reviewed Jan 27, 2020

View reviewed changes

nengo_dl/tensor_node.py Show resolved Hide resolved

nengo_dl/converter.py Outdated Show resolved Hide resolved

nengo_dl/converter.py Outdated Show resolved Hide resolved

CHANGES.rst Show resolved Hide resolved

drasmuss added 7 commits January 28, 2020 13:38

Update to nengo-bones 0.9.0

c5ff329

Update default CI distribution to xenial and test against bionic. Update copyright dates to 2020.

Improve error message for undefined shapes

986a0b9

Update appveyor image

eca9347

TensorFlow 2.1 is built with VS 2019.

Handle duplicate nodes in converter

11edbbf

Batch all layer weight copying

82b11d4

Generate more informative names in TensorNodes

e0f1c4a

Fix bug in Concatenate layer conversion

6528475

The axis argument was being ignored.

drasmuss added 11 commits January 28, 2020 13:39

Fix bug with passthrough inputs in converter

c35ec4d

Set training argument in TensorNode Layers

8d8921c

Reduce memory usage of build process

3056a7d

Store TensorSignal indices as slices instead of full lists of indices. Store initial values of base arrays more efficiently.

Allow saved state to be placed on GPU

ec1f40e

Can improve the speed of state updates, and likely doesn't make a significant difference to the memory size (relative to all the other internal state on the GPU).

Add inference_only option in converter

849b6b4

Set core builder precision based on sim dtype

232f1b3

Change minimum numpy version to 1.16.0

004e132

Switch to list_physical_devices for GPU check

f3b7f2b

The is_gpu_available function is deprecated. Use sys.executable in TF GPU check, which ensures that the check is using the same python executable as the source script.

Update tensorflow-models example for TF 2.1

030ad88

The behaviour of the batch_size parameter was changed slightly.

Update TensorFlow doc links

cb7dea3

Adjust performance benchmarks

e0c3479

Some changes were made to the ABR GPU server which sped things up slightly.

tbekolay force-pushed the converter2 branch from 9661427 to e0c3479 Compare January 28, 2020 18:41

tbekolay approved these changes Jan 28, 2020

View reviewed changes

tbekolay merged commit e0c3479 into master Jan 28, 2020

tbekolay deleted the converter2 branch March 4, 2020 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converter bugfixes/improvements #119

Converter bugfixes/improvements #119

drasmuss commented Dec 19, 2019

drasmuss commented Jan 16, 2020

hunse commented Jan 20, 2020

drasmuss commented Jan 21, 2020

tbekolay left a comment

tbekolay left a comment

Converter bugfixes/improvements #119

Converter bugfixes/improvements #119

Conversation

drasmuss commented Dec 19, 2019

drasmuss commented Jan 16, 2020

hunse commented Jan 20, 2020

drasmuss commented Jan 21, 2020

tbekolay left a comment

Choose a reason for hiding this comment

tbekolay left a comment

Choose a reason for hiding this comment