-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converter bugfixes/improvements #119
Conversation
a517739
to
30ef415
Compare
00ffd7a
to
85f9846
Compare
11ceedc
to
cf275f5
Compare
Note for posterity. The In case it is useful in the future, here is a benchmark script I made to test the cost of different read/write types: from collections import defaultdict
import timeit
import numpy as np
import tensorflow as tf
from tensorflow.python.eager import profiler
from tensorflow.python.ops.gen_state_ops import (
TemporaryVariable,
DestroyTemporaryVariable,
)
tf.config.optimizer.set_experimental_options({"disable_meta_optimizer": True})
minibatch_size = 64
base_shape = (minibatch_size, 16384)
read_write_size = 4096
reps = 1000
with tf.Graph().as_default() as graph:
results = defaultdict(list)
idxs = tf.constant(
np.random.uniform(0, base_shape[1], size=read_write_size), dtype=tf.int32
)
idxs_nd = tf.stack(
tf.meshgrid(tf.range(minibatch_size, dtype=tf.int32), idxs, indexing="ij",),
axis=-1,
)
base = tf.compat.v1.placeholder(shape=base_shape, dtype=tf.float32)
read_identity = tf.compat.v1.placeholder(
shape=(minibatch_size, read_write_size), dtype=tf.float32
)
for i in range(reps):
with tf.control_dependencies(results["read_identity"]):
results["read_identity"] = [read_identity]
with tf.control_dependencies(results["read_slice"]):
results["read_slice"] = [
tf.strided_slice(base, [0, 0], [minibatch_size, read_write_size])
]
with tf.control_dependencies(results["read_gather"]):
results["read_gather"] = [tf.gather(base, idxs, axis=1)]
with tf.control_dependencies(results["read_slice_concat"]):
results["read_slice_concat"] = [
tf.concat(
[
tf.strided_slice(
base, [0, 0], [minibatch_size, read_write_size // 2]
),
tf.strided_slice(
base,
[0, base_shape[1] - read_write_size // 2],
[minibatch_size, base_shape[1]],
),
],
axis=1,
)
]
with tf.control_dependencies(results["write_assign"]):
results["write_assign"] = [read_identity]
with tf.control_dependencies(results["write_assign_add"]):
if i == 0:
results["write_assign_add"] = [read_identity]
else:
results["write_assign_add"] = [
results["write_assign_add"][0] + read_identity
]
with tf.control_dependencies(results["write_scatter_add"]):
results["write_scatter_add"] = [
tf.tensor_scatter_nd_add(base, idxs_nd, read_identity)
]
with tf.control_dependencies(results["write_scatter_update"]):
results["write_scatter_update"] = [
tf.tensor_scatter_nd_update(base, idxs_nd, read_identity)
]
with tf.control_dependencies(results["write_temp_var_add"]):
var = TemporaryVariable(shape=base.shape, dtype=base.dtype)
var_name = var.op.name
var = tf.compat.v1.assign(var, base)
var = tf.compat.v1.scatter_nd_add(var, idxs_nd, read_identity)
results["write_temp_var_add"] = [
DestroyTemporaryVariable(ref=var, var_name=var_name)
]
with tf.control_dependencies(results["write_temp_var_update"]):
var = TemporaryVariable(shape=base.shape, dtype=base.dtype)
var_name = var.op.name
var = tf.compat.v1.assign(var, base)
var = tf.compat.v1.scatter_nd_update(var, idxs_nd, read_identity)
results["write_temp_var_update"] = [
DestroyTemporaryVariable(ref=var, var_name=var_name)
]
# change all the results to the same output, to remove i/o discrepancies
for k, v in results.items():
with tf.control_dependencies(v):
results[k] = tf.constant(1)
with tf.compat.v1.Session(graph=graph) as sess:
feed_dict = {
base: np.random.uniform(size=base_shape),
read_identity: np.random.uniform(size=(minibatch_size, read_write_size)),
}
# profiler.start()
sess.run(results, feed_dict=feed_dict)
# profiler.save("tmp2_profile", profiler.stop())
for key, vals in results.items():
print(key)
time = 1e10
for _ in range(50):
start = timeit.default_timer()
sess.run(
vals, feed_dict=feed_dict,
)
time = min(time, timeit.default_timer() - start)
print(time) |
a1a3e11
to
0267ba7
Compare
I'm getting the following warnings in
Might want to fix these here. I think it's just a matter of using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all LGTM! Had a few small questions/comments that could lead to changes but also could be left as is, so I'll wait on your response before proceeding @drasmuss.
Update default CI distribution to xenial and test against bionic. Update copyright dates to 2020.
TensorFlow 2.1 is built with VS 2019.
The axis argument was being ignored.
Store TensorSignal indices as slices instead of full lists of indices. Store initial values of base arrays more efficiently.
Can improve the speed of state updates, and likely doesn't make a significant difference to the memory size (relative to all the other internal state on the GPU).
The is_gpu_available function is deprecated. Use sys.executable in TF GPU check, which ensures that the check is using the same python executable as the source script.
The behaviour of the batch_size parameter was changed slightly.
Some changes were made to the ABR GPU server which sped things up slightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All LGTM now, merging when CI finishes.
Fixing some issues found in converter for more complex models, and improving performance of converted networks.