Very rudimentary Tensorflow implementation of C3D on UCF11 vidoe dataset
Original paper: "Learning Spatiotemporal Features with 3D Convolutional Networks"
This is the final project for RPI's ECSE DL course. Hail Qiang Ji!!!!
The following info should be enough for you to have a taste of an oversimplified C3D implementation.
- The code includes both training and testing/plotting. Feel free to modify it using tf.saver etc for fine-tuning and testing.
- Due to memory issue I used fp16 instead of fp32 for data loading. If you have >16G memory feel free to use FP32.
- Converge time for batchsize=5 and ephoches=10 on RTX2080: <20 Mins.
- Pickle file for data:
- data shape: 10(Batch)x30(Frames per video sequence)x64x64x3(Image size)
- The PDF summerizes the design choice, dataset splitting, parameter choice etc. I did not really fine tune hyper parameters.
- Major changes compared to the model proposed by the original paper: see PDF file