-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input Tensors: High Performance Large Datasets via TFRecords #6928
Conversation
watching this PR closely, thanks for your efforts @ahundt |
We already support fitting from tensors / TFRecords. Please make a case for this PR. |
@fchollet Is there a TFRecords example with the Features
Why modify model.fit() to accept TFRecords instead of writing my own training loop?
I can implement things in a different manner as needed. I can also split this into several smaller pull requests according to the headline changes below. model.fit(x=None, y=None)In several places Disabling sample weightsSample_weights in I chose the is_placeholder()I use is_placeholder() to differentiate numpy inputs from tensorflow op inputs, which allows
The implementation of TFRecords ExampleThere is quite a bit of interest in training models from TFRecords, with a bunch of comments + thumbs in the keras-contrib PR and tensorflow issue. Loading from TFRecords will certainly be faster than from disk, and is suggested in the high performance models docs. The dataset I'm using was released as a TFRecord, so it is quite convenient. I didn't really like this workaround: Model(inputs=[x_train_input, y_train_in_out], outputs=[x_train_out, y_train_in_out]) Is there a better way to supply label values from a TFRecord? class Function(fetches, feed_dict)Before this change, auxiliary tf tensor data & operations could almost be provided to the My motivating API extension example is training on a realtime data source like a robot. I want to supply live images via an op while actively fitting, and log the image via another op. There is no need for the logging op to affect my model/training but the source/sink should run in the same session, and the overhead of moving the images to numpy arrays and serializing would cause frames to drop. |
So currently you are supposed to be able to:
What changes? |
Here are the changes, marked with # tf ops that supply dataset images and labels
x_train_batch, y_train_batch = read_and_decode_recordinput(...)
# create a basic mlp
x_train_input = Input(tensor=x_train_batch)
x_train_out = cnn_layers(x_train_input)
# y label batch is input & output
y_train_in_out = Input(tensor=y_train_batch)
# ==bugfix==
# these lines cause a crash without the patch
# because an invalid call equivalent to this is made:
# K.placeholder(dtype=x_train_input)
train_model = Model(inputs=[x_train_input, y_train_in_out],
outputs=[x_train_out, y_train_in_out])
# ==new param option==
# model.compile(sample_weight_mode='disabled')
# ==bugfix==
# This call will crash without the patch,
# even if sample_weight_mode=None.
train_model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'],
sample_weight_mode='disabled')
# ==new param==
# sess.run(fetches, feed_dict) parameters handled correctly in backend
hello = tf.constant('Hello, TensorFlow!')
train_model.fit(batch_size=batch_size, epochs=300, fetches=hello) |
It seems right now I've broken more cases than I fixed in the tests... I knew this code section would get pretty complicated. Might be a few days before I can address further code fixes. |
If sample weighting cannot be supported, then it should be disabled automatically and a
What does it do exactly and why is it necessary?
Can you explain the Thanks! |
Great idea, why didn't I think of that? Thanks!
It is not part of the API, it is user code creating the tf ops that feed data in mnist_tensorflow.py, read_and_decode_recordinput(). I'll explain in more detail in the datasets pull request, because it is most relevant there.
You may slap your head when you read this one, but no worries I'm happy to clarify! :-)
|
In |
…arded in Function
What is does is forwards Here are some first pass doc lines: history = model.fit(fetches,feed_dict)
"""
# Arguments
fetches: TensorFlow backend only. A single TensorFlow graph element,
a list of graph elements, or a dictionary whose values are graph elements
or lists of graph elements. Passes tensor ops to `tf.session.run()` and
returns an additional tensor tuple upon completion. This is for advanced
users that require auxiliary processing as fit runs. When provided,
additional tensor values are returned:
`history, tensors = model.fit(fetches,feed_dict)`
See https://www.tensorflow.org/api_docs/python/tf/Session for more details
on this parameter.
feed_dict: TensorFlow backend only. A dictionary that maps TensorFlow graph
elements to values. This is for advanced users that require auxiliary
processing as `model.fit()` runs.
See https://www.tensorflow.org/api_docs/python/tf/Session for more details on this parameter.
""" Motivation: I want to supply live images via an op while actively fitting, and log the image via another op. Details are in my previous post #6928 (comment). |
* tensorflow_backend_Function: backend_test.py cleanup new TF backend test backend_test.py rename f to g to fix test error typo fix test_backend.py K.backend.function() error handling test backend_test.py reverse change. test keras.backend.Function() input handling backend test import zip_longest from six.moves for keras-team#7064 tensorflow_backend.py remove self.feed_to_fetch_count tensorflow_backend Function() corner case handling and error checking # Conflicts: # keras/backend/tensorflow_backend.py # tests/keras/backend/backend_test.py
* commit '41fdd5aa5fe080bf5fa12638b9abc9f50d835a39': training.py improve docstrings and error case fix docstring comments from review (keras-team#7113)
* generic_utils.py don't crash when dealing with batched data * Progbar() unit test * mnist_tfrecord.py added (#7061, #7072, #6928, #7046) * Fix mnist_tfrecord.py runtime errors * mnist_tfrecord.py pep8 * mnist_tfrecord.py add parallelism option * reorder inputs * mnist_tfrecord.py indentation fix * lower batch size and epochs * loss defaults to None in compile() * mnist_tfrecord.py * model.fit(steps_per_epoch) added * added _check_num_samples for cases when batch_size does not apply * fix test failures * remove inaccurate warning * improved fit(steps_per_epoch) with separate internal epoch loop in _fit_loop * fit(steps_per_epoch) initial validation support * training.py pep8 * mnist_tfrecord.py fix key missing lines * mnist_tfrecord.py add coordinator * removed extraneous line * mnist_tfrecord.py and training.py clean up based on review (#7113) * mnist_tfrecord.py extended description * training.py fix test error * mnist_tfrecord.py and training.py fixed review comments, docs, and error messages (#7113) * training.py fix unit test error for steps_per_epoch * fix docstring comments from review (#7113) * training.py improve docstrings and error case
* master: model.fit(steps_per_epoch), mnist_tfrecord.py, progbar np.mean (keras-team#7113) Change default size to allow different MobileNet sizes (keras-team#7586) cntk backend: fix the reversed rnn bug (keras-team#7593) Fix mask for multi output --> multi inputs (keras-team#7591) [RELNOTES] Subtract merge layer (keras-team#7573) update docker with cntk 2.1 (keras-team#7565) [RELNOTES] Move constraint management to be based on variable attributes (like TF). (keras-team#7564) Add handling for `dtype` arg in initializer config. Fix keras-team#7550. (keras-team#7552) remove unnecessary function definition (keras-team#7542) refactor the function - _convert_string_dtype (keras-team#7540) Add batchnorm tests for CNTK (keras-team#7534) Clean up RNN tests (keras-team#7529) Add merge tests (keras-team#7533) # Conflicts: # examples/mnist_tfrecord.py # keras/engine/training.py # keras/utils/generic_utils.py # tests/keras/engine/test_training.py
|
||
def predict(self, x, batch_size=32, verbose=0): | ||
def predict(self, x=None, batch_size=32, verbose=0, steps=None): | ||
"""Generates output predictions for the input samples. | ||
|
||
Computation is done in batches. | ||
|
||
# Arguments | ||
x: the input data, as a Numpy array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or None
@@ -159,6 +159,8 @@ def test_is_keras_tensor(self): | |||
assert k.is_keras_tensor(keras_var) is False | |||
keras_placeholder = k.placeholder(shape=(2, 4, 5)) | |||
assert k.is_keras_tensor(keras_placeholder) is False | |||
assert getattr(keras_placeholder, 'is_keras_placeholder', False) is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test for
K.is_keras_tensor(..., expect_other_types=True):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is rather massive (1000+ lines) and I think it contains a lot of changes that don't seem necessary in order to implement the "data tensor action plan". In particular:
is_keras_tensor(x, expect_other_types=False)
(it's also a rather strange API -- try asking a random user what they expect this function to do, given its signature)- Changes related to the
is_keras_placeholder
attribute on both layers and tensors - Changes to the
Function
API aren't immediately related to data tensors support
I would recommend simplifying, and focus on:
- Supporting the
steps
argument inevaluate
/predict
. This is a self-contained changed. - Supporting recompilation if a target data tensor is passed to
fit
. This is also a self-contained changed.
These can be done in two PRs. Both would only involve changes in training.py
as far as I can tell. I would expect these changes to end up at ~30-50% of the total size of this PR.
Update: 2017-08-24 After further testing master is still broken for my real use cases. At this time I still recommend using this PR over master if you wish to use input tensors. |
What's missing currently is the ability to pass target tensors in `fit`
(which would be handled via temporary recompilation, with caching of the
default target placeholders).
…On 22 August 2017 at 13:58, Andrew Hundt ***@***.***> wrote:
Much of the functionality in this PR has been reimplemented and merged on
master as of 2.0.7. I don't have cycles right now but I plan on combing
through this to separate any key tests and other functionality out.
However, at this point I suggest using the examples in master, and if
anything is broken we can make a unit test so the changes can be
reimplemented.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6928 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AArWb0PKNoPiT63S8ZDfZir4yYw2IZM-ks5sa0DdgaJpZM4N2Eew>
.
|
@fchollet For some reason I didn't see your recent comment #6928 (review) until now, I'm sorry about that.
There is definitely a small amount of code cleanup that could be left out. I may be mistaken, but I suspect that something like ~80% of these changes will prove necessary in practice. This is because there are several use cases not yet supported in master which I may not have elucidated very clearly yet, and I apologize for that.
I agree it is not pretty, what about renaming the parameter to I believe the most correct behavior of This could easily be a separate PR, are you open to that possibility?
The zipping of lists and moving lists between tensor input and the feed dict is required for it to work, even with basic use cases. Only the https://gist.github.com/ahundt/5c2c7f5a324171bbbc8a5b622f2162c5, it will start at this error, tested on master 2017-08-22 at 9pm EST:
The solution to fix the error at the above trace is #7067, but it will reveal another issue as soon as it is fixed. Pull on that thread long enough and you will eventually be patching
Unfortunately, I think a number of the |
How do we handle multi-tensor-output models? |
@TimZaman You should save each example proto in your map separately, and read each feature out of your map separately. See how there are separate image, height and width features in |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
It is possible to make due with the mode.compile(input_tensors) parameter and I don't have the free cycles to keep merging with master on an ongoing basis, so I'm closing this. If someone else has interest in getting these features integrated please let me know and I'll be happy to help. |
Implements the Input Tensor API detailed in #7102 (comment)
Update (2017-08-08): Two supported use cases based on reviews:
API usage, with working mnist_tfrecord.py implementation.
Summary
This PR adds support for yield ops to Keras plus an example utilizing TFRecords. Correct support for yield ops in
Model
adds valuable functionality not currently supported by Keras for the reasons detailed below.It re-compiles on demand when tensors are passed to
y
inmodel.fit()
.Yield ops
Yield ops aka data tensors, such as RecordInput (test code), are different from
tf.Variable
because they provide data entirely on the C++ side when run withoutfetches
orfeed_dict
, and are thus extremely efficient for large data like images.Changes
Here are the changes, marked with ==bugfix== and ==new param== in the comments below:
There are extended unit tests and support for Input Tensors with each of the following APIs given an Input Tensor (aka yield op)
x
andy
:TFRecord
TFRecord support is a side effect and key motivator of yield op support, and examples/mnist_tfrecord.py demonstrates usage.
Update: I've moved the
fetches
andfeed_dict
public API design into #6974. This PR now focuses more narrowly on supporting an input tensoryield_op
as a parameter.Performance Update
This latest version runs mnist_tfrecord.py twice as fast as it did previously!