CTC (Connectionist Temporal Classification) Implementation #4681

ChWick · 2016-09-03T20:22:53Z

I implemented a basic CTC algorithm for Caffe: CTCLossLayer for loss and gradient calculation. CTCDecoderLayer for decoding, only a greedy is implemented though.

It is based on the implementation of tensorflow and the paper of A. Graves. Since I mostly transcribed the code of tensorflow you should check if there are any copyright issues.

Moreover, I implemented an additional ReverseLayer which I use for bidirectional recurrent nets (e.g. BLSTMs).

I added a dummy example that shows the basic functionality (besides the tests) by overfitting dummy data.

…ests. Added reverse layer (usefull for bidirectional recurrent layers, e.g. BLSTM), finished working on CTC-Loss-Layer, more tests. Separated forward and backward pass by introducing new intermediate variables (e.g. alpha and beta). CTCDecoderLayer: added scores and optional accuracy as top blobs. Implemented CTCDecoderLayerTest for GreedyDecoder. Added parameters to ctc decoder layer into proto. Added dummy example to ctc examples. Added an example to show the progress of learning. Fixed lint errors, made layout changes

…ces that are terminated by -1. Seperated header and src files code

cysin · 2016-11-01T01:08:15Z

Hi, I tried this CTC with 'image captioning' example, but the loss didn't decrease and always told me 'no valid path found'. So is there a specific format details about the input data?

ChWick · 2016-11-02T13:25:21Z

I tried to explain the input data layout in the class description of include/caffe/layers/ctc_loss_layer.hpp.
There are three parallel input blobs as required for recurrent nets, e.g. LSTMs. The data blob has a shape of TxNxRawData, the sequence indicators are shaped as TxN and contain a 0 at the beginning of a sequence and a 1 if the sequence continues. The label blob is of shape TxN and contains the index of the label as entry and -1 if the sequence has finished. In each batch of length T may only be written exactly one sequence. A single sequence may also not exceed the length of T.
Note that you also need one extra output in your last layer for the blank label. So if you have 100 labels, the input data blob in the CTC Layer must be of shape TxNx101.
If you shaped you data in a correct manner you should check for a non-exploding gradient either by adding a gradient clipping (solver parameter) and/or reducing the learning rate.

wellhao · 2017-02-10T03:42:04Z

Hi, I ran your example, I want to use LSTM+CTC to recognise English words, but it is not very clear how to do.Could you give me some suggestions? Thank you very much.

Jenkyrados · 2017-04-04T07:56:13Z

A small problem with the BLSTM part of the code : as of right now, the sequence indicators must be complete (span the whole sequence) to effectively feed the information into the reversed layers.
Otherwise, the reverse layer will not consider the information at the beginning of the sequence, due to the sequence indicators prematurely ending with 0s.

ChWick · 2017-04-04T11:28:35Z

@Jenkyrados True! I already handled this issue by adding a "ReverseTimeLayer" that correctly reverses the sequence depending on its length. The ReverseLayer can be removed or considered as a 'Mirror' that can't be used to reverse sequences with different lengths but instead to mirror images, etc.
These changes are not in this PR. I did not update the PR branch because it seems that the caffe maintainers do not remark this PR.

Jenkyrados · 2017-04-04T11:54:43Z

Hmm, curious how you did it, is it in a branch of your fork?
I hope the pr gets merged, found it personally pretty helpful for end to end recognition!

ChWick · 2017-04-04T12:01:31Z

@Jenkyrados Have a look in my warp-ctc branch https://github.com/ChWick/caffe/tree/warp-ctc. It also adds a WarpCTC layer that wraps https://github.com/baidu-research/warp-ctc.

Jenkyrados · 2017-04-04T12:27:14Z

Looks good. Thanks a bunch!

06221098 · 2017-04-07T13:07:24Z

In warp_ctc_layer.cpp, #include <warp_ctc/ctcpp.h>. But I couldn't find the "warp_ctc" folder or ctcpp.h in this branch.

ChWick · 2017-04-07T13:15:18Z

@06221098 If you want to use warp-ctc with caffe, you need to use the dev branch of my warp-ctc fork: https://github.com/ChWick/warp-ctc/tree/develop It adds support for c++ and templates as required in caffe. You can compile wrap_ctc as standalone shared library (cmake && make install) and add the path to the installation in caffe

06221098 · 2017-04-08T13:29:35Z

@ChWick Thank you very much for your prompt response. I am trying as you said.

06221098 · 2017-04-08T14:19:58Z

@ChWick Hi, ChWick. I compiled the warp-ctc-develop followed the following steps:
cd warp-ctc-develop
mkdir build
cd build
cmake ../
make
The problem comes when I execute the last step "make". I got this log:
Any pointers or suggestion will be helpful. Thanks in advance.

yyy@node5:/warp-ctc-develop$ mkdir build
yyy@node5:/warp-ctc-develop$ cd build
yyy@node5:/warp-ctc-develop/build$ cmake ../
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found CUDA: /usr/local/cuda-7.5 (found suitable version "7.5", minimum required is "6.5")
-- cuda found TRUE
-- Found Torch7 in /home/yyy/torch/install
-- Torch found /home/yyy/torch/install/share/cmake/torch
-- Building shared library with GPU support
-- Building Torch Bindings with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/yyy/warp-ctc-develop/build
yyy@node5:/warp-ctc-develop/build$ make
[ 20%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o
[ 40%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_ctcpp_entrypoint.cu.o
/home/yyy/warp-ctc-develop/src/ctcpp_entrypoint.cu(1): error: this declaration has no storage class or type specifier

/home/yyy/warp-ctc-develop/src/ctcpp_entrypoint.cu(1): error: expected a ";"

2 errors detected in the compilation of "/tmp/tmpxft_00000e27_00000000-16_ctcpp_entrypoint.compute_52.cpp1.ii".
CMake Error at warpctc_generated_ctcpp_entrypoint.cu.o.cmake:264 (message):
Error generating file
/home/yyy/warp-ctc-develop/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_ctcpp_entrypoint.cu.o

make[2]: *** [CMakeFiles/warpctc.dir/src/./warpctc_generated_ctcpp_entrypoint.cu.o] Error 1
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
make: *** [all] Error 2

06221098 · 2017-04-08T15:21:14Z

@ChWick Sorry to bother you. I have solved my problems. Thank you very much.

shengyudingli · 2017-04-13T08:44:19Z

@ChWick Excuse me, can you give a simple usage for CTCGreedyDecoderLayer? I use this layer on the top of a net, but I get a strange output, I suppose I used it in a wrong way......

ChWick · 2017-04-13T08:58:26Z

@shengyudingli
From tc_decoder_layer.hpp:

// probabilities (T x N x C),
// sequence_indicators (T x N),
// target_sequences (T X N) [optional]
// if a target_sequence is provided, an additional accuracy top blob is
// required
virtual inline int MinBottomBlobs() const { return 2; }
virtual inline int MaxBottomBlobs() const { return 3; }

// sequences (terminated with negative numbers),
// output scores [optional if 2 top blobs and bottom blobs = 2]
// accuracy [optional, if target_sequences as bottom blob = 3]
virtual inline int MinTopBlobs() const { return 1; }
virtual inline int MaxTopBlobs() const { return 3; }

Provide the output of you last InnerProductLayer as first blob (probabilities), the sequence indicators for the LTSM Layers as the second blob (for computing the input sequence lengths). Optionally you can provide a target_sequence blob as 3rd bottom blob for computing the scores/accuracy.
The greedy decoding algorithmus equates to the argmax of the sequence and removing blanks or repeated labels.

ChWick force-pushed the ctc branch 2 times, most recently from 4f614f7 to c7dd6e7 Compare September 3, 2016 21:52

ChWick force-pushed the ctc branch from c7dd6e7 to 0543ff6 Compare September 3, 2016 21:55

ChWick added 2 commits September 5, 2016 09:38

added softmax to plot_dummy_progress. These are the 'real' probabilities

6858975

Added output/top blob to decoder layer to output the predicted sequen…

1173e35

…ces that are terminated by -1. Seperated header and src files code

ChWick force-pushed the ctc branch from 57601dc to 1173e35 Compare September 9, 2016 14:45

CTC: fixes for 0 label/sequence lengths

dcf029c

Luonic approved these changes Oct 13, 2016

View reviewed changes

Merge remote-tracking branch 'bvlc/master' into ctc

20cf9e8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CTC (Connectionist Temporal Classification) Implementation #4681

CTC (Connectionist Temporal Classification) Implementation #4681

ChWick commented Sep 3, 2016

cysin commented Nov 1, 2016

ChWick commented Nov 2, 2016

wellhao commented Feb 10, 2017

Jenkyrados commented Apr 4, 2017

ChWick commented Apr 4, 2017

Jenkyrados commented Apr 4, 2017

ChWick commented Apr 4, 2017

Jenkyrados commented Apr 4, 2017

06221098 commented Apr 7, 2017

ChWick commented Apr 7, 2017

06221098 commented Apr 8, 2017

06221098 commented Apr 8, 2017 •

edited

Loading

06221098 commented Apr 8, 2017

shengyudingli commented Apr 13, 2017

ChWick commented Apr 13, 2017

CTC (Connectionist Temporal Classification) Implementation #4681

Are you sure you want to change the base?

CTC (Connectionist Temporal Classification) Implementation #4681

Conversation

ChWick commented Sep 3, 2016

cysin commented Nov 1, 2016

ChWick commented Nov 2, 2016

wellhao commented Feb 10, 2017

Jenkyrados commented Apr 4, 2017

ChWick commented Apr 4, 2017

Jenkyrados commented Apr 4, 2017

ChWick commented Apr 4, 2017

Jenkyrados commented Apr 4, 2017

06221098 commented Apr 7, 2017

ChWick commented Apr 7, 2017

06221098 commented Apr 8, 2017

06221098 commented Apr 8, 2017 • edited Loading

06221098 commented Apr 8, 2017

shengyudingli commented Apr 13, 2017

ChWick commented Apr 13, 2017

06221098 commented Apr 8, 2017 •

edited

Loading