-
Notifications
You must be signed in to change notification settings - Fork 652
[WIP] initial TensorFlow TFRecord integration with Keras #27
Conversation
About TFRecords: https://www.tensorflow.org/api_guides/python/python_io Code Source: https://github.com/indraforyou/keras_tfrecord Primary Author: Indranil Sur @indraforyou License: MIT (same as Keras)
@farizrahman4u How do I configure this code to pass on the theano backend? This code is a keras API for tensorflow TFrecords, so even the theano backend will need access to tensorflow by definition. Perhaps something like what's done in keras/backend/init.py? |
By definition, this would be a feature that can only be used with the TensorFlow backend. I don't know if it's been decided on whether contrib will allow features that only work on one backend. The only Keras functionality that fits that bill is the seperableConv layer (it's TF only currently). |
Well in theory it could probably run on the theano backend if tf is available strictly for dataset loading, but that should probably be addressed when there is a user that needs it. @patyork I'm not sure what changes to make, what's the best way to skip this file when the tensorflow backend is not available? Also, is the proposed datasets location appropriate? I was split between there and the backends folder... |
It would run with the Theano backend in very specific circumstances, where the performance impact would negate any benefits that may or may not come from having the image preprocessing in an unusable graph.
And no, any backend-specific code needs to be in the corresponding Also, you're main problem is a compilation error: |
Are you saying I should move tfrecord.py directly into keras_contrib/backend/tensorflow_backend.py before it can be merged? Not completely sure that's the best fit yet, but I could do that. All TFRecord code will depend on TF unless someone implemented a python+protobuf only backend for loading them, not actually that hard but also probably not high priority for anyone. I might implement something that converts TFRecord datasets to the more commonly used keras dataset formats though.
Thanks, I don't know how I missed that one. |
I stand by this statement; how you go about moving all references to TensorFlow/ If you do manage to split up the code, I'd image the TF specific stuff goes into the TF backend, and any usage of that functionality might be in the |
No problem! I'll move it right over.
I plan to write/integrate a couple of specific dataset loaders over the next few weeks. One of those is only available as TFRecords. |
Before adding tests, is this the best design? Specifically, is there perhaps a way the existing |
I would argue its a bad design.. I quickly made the code working to meet @ahundt 's bounty deadline. Rewrite of both I would suggest to make an extra param say |
@indraforyou so you're suggesting that for upstream Keras-2? I don't think a parameter explicitly named tfrecord would necessarily be ideal, since it wouldn't help with the next format added. Probably the best option in the end would be like what's done for the tensorflow backend, where a data format backend could be incorporated and specialized for HDF5 and tfrecord. However, data format backends seems much larger scope. Perhaps the current basic function is sufficient for the moment and a separate issue should be created until there is both more interest and a dataset available? |
Nice work! This feature is definitely interesting since it's a problem in a lot of organizations. HDF5 and tfrecord seem to be the 2 formats of choice. I will try to review the PR today and add some comments. The integration in a clean way seems tricky. |
@farizrahman4u that merge seemed to break travis |
We could maybe merge this PR when it will be repaired and warn the user that it is an experimental feature? This way we will be able to test tfRecord with Keras and see where we could integrate these functionalities in the core code. What do you think? |
@tboquet I would because I've tested it manually, but I can't accept pull requests. If you want to try out the code you can clone it from my fork or do it with github's hub command line command. https://github.com/github/hub |
Thank you @indraforyou , @ahundt for this excellent work. This was much needed for my project and it works like a charm. Also this patch was one of the reason I switched from TFLearn to Keras around 5 days ago. I have few questions to extend this work further.
|
# Conflicts: # keras_contrib/backend/tensorflow_backend.py
# Conflicts: # keras_contrib/backend/tensorflow_backend.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is possible to move all the new functions from the tensorflow backend to another module? It will be easier to manage I guess. I'm not sure about this so let's discuss about it.
@@ -153,6 +158,309 @@ def depth_to_space(input, scale, data_format=None): | |||
out = _postprocess_conv2d_output(out, data_format) | |||
return out | |||
|
|||
|
|||
def data_to_tfrecord(images, labels, filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this images_to_tfrecord
what do you think?
Hi @ahundt, what's the status on this? These are nice functionalities especially with the |
@tboquet I think it needs some reworking. I tried to discuss the next steps at tensorflow/tensorflow#8787 but haven't really heard back. |
hi @ahundt |
I've done a second round of this upstream, I'd appreciate some testing + fixes + review from anyone who is interested: I'll probably close this pull request soon since keras-team/keras#6928 will be a significant improvement once it works. |
@ahundt sounds good! |
keras-team/keras#6928 now passes the unit tests so I'm closing this. |
About TFRecords: https://www.tensorflow.org/api_guides/python/python_io
Code Source: https://github.com/indraforyou/keras_tfrecord
Primary Author: Indranil Sur @indraforyou
License: MIT (same as Keras)