Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTC (Connectionist Temporal Classification) Implementation #4681

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ChWick
Copy link

@ChWick ChWick commented Sep 3, 2016

I implemented a basic CTC algorithm for Caffe: CTCLossLayer for loss and gradient calculation. CTCDecoderLayer for decoding, only a greedy is implemented though.

It is based on the implementation of tensorflow and the paper of A. Graves. Since I mostly transcribed the code of tensorflow you should check if there are any copyright issues.

Moreover, I implemented an additional ReverseLayer which I use for bidirectional recurrent nets (e.g. BLSTMs).

I added a dummy example that shows the basic functionality (besides the tests) by overfitting dummy data.

…ests.

Added reverse layer (usefull for bidirectional recurrent layers, e.g. BLSTM), finished working on CTC-Loss-Layer, more tests.
Separated forward and backward pass by introducing new intermediate variables (e.g. alpha and beta).
CTCDecoderLayer: added scores and optional accuracy as top blobs. Implemented CTCDecoderLayerTest for GreedyDecoder.
Added parameters to ctc decoder layer into proto.
Added dummy example to ctc examples.
Added an example to show the progress of learning.
Fixed lint errors, made layout changes
@cysin
Copy link

cysin commented Nov 1, 2016

Hi, I tried this CTC with 'image captioning' example, but the loss didn't decrease and always told me 'no valid path found'. So is there a specific format details about the input data?

@ChWick
Copy link
Author

ChWick commented Nov 2, 2016

I tried to explain the input data layout in the class description of include/caffe/layers/ctc_loss_layer.hpp.
There are three parallel input blobs as required for recurrent nets, e.g. LSTMs. The data blob has a shape of TxNxRawData, the sequence indicators are shaped as TxN and contain a 0 at the beginning of a sequence and a 1 if the sequence continues. The label blob is of shape TxN and contains the index of the label as entry and -1 if the sequence has finished. In each batch of length T may only be written exactly one sequence. A single sequence may also not exceed the length of T.
Note that you also need one extra output in your last layer for the blank label. So if you have 100 labels, the input data blob in the CTC Layer must be of shape TxNx101.
If you shaped you data in a correct manner you should check for a non-exploding gradient either by adding a gradient clipping (solver parameter) and/or reducing the learning rate.

@wellhao
Copy link

wellhao commented Feb 10, 2017

Hi, I ran your example, I want to use LSTM+CTC to recognise English words, but it is not very clear how to do.Could you give me some suggestions? Thank you very much.

@Jenkyrados
Copy link

A small problem with the BLSTM part of the code : as of right now, the sequence indicators must be complete (span the whole sequence) to effectively feed the information into the reversed layers.
Otherwise, the reverse layer will not consider the information at the beginning of the sequence, due to the sequence indicators prematurely ending with 0s.

@ChWick
Copy link
Author

ChWick commented Apr 4, 2017

@Jenkyrados True! I already handled this issue by adding a "ReverseTimeLayer" that correctly reverses the sequence depending on its length. The ReverseLayer can be removed or considered as a 'Mirror' that can't be used to reverse sequences with different lengths but instead to mirror images, etc.
These changes are not in this PR. I did not update the PR branch because it seems that the caffe maintainers do not remark this PR.

@Jenkyrados
Copy link

Hmm, curious how you did it, is it in a branch of your fork?
I hope the pr gets merged, found it personally pretty helpful for end to end recognition!

@ChWick
Copy link
Author

ChWick commented Apr 4, 2017

@Jenkyrados Have a look in my warp-ctc branch https://github.com/ChWick/caffe/tree/warp-ctc. It also adds a WarpCTC layer that wraps https://github.com/baidu-research/warp-ctc.

@Jenkyrados
Copy link

Looks good. Thanks a bunch!

@06221098
Copy link

06221098 commented Apr 7, 2017

In warp_ctc_layer.cpp, #include <warp_ctc/ctcpp.h>. But I couldn't find the "warp_ctc" folder or ctcpp.h in this branch.

@ChWick
Copy link
Author

ChWick commented Apr 7, 2017

@06221098 If you want to use warp-ctc with caffe, you need to use the dev branch of my warp-ctc fork: https://github.com/ChWick/warp-ctc/tree/develop It adds support for c++ and templates as required in caffe. You can compile wrap_ctc as standalone shared library (cmake && make install) and add the path to the installation in caffe

@06221098
Copy link

06221098 commented Apr 8, 2017

@ChWick Thank you very much for your prompt response. I am trying as you said.

@06221098
Copy link

06221098 commented Apr 8, 2017

@ChWick Hi, ChWick. I compiled the warp-ctc-develop followed the following steps:
cd warp-ctc-develop
mkdir build
cd build
cmake ../
make
The problem comes when I execute the last step "make". I got this log:
Any pointers or suggestion will be helpful. Thanks in advance.

yyy@node5:/warp-ctc-develop$ mkdir build
yyy@node5:
/warp-ctc-develop$ cd build
yyy@node5:/warp-ctc-develop/build$ cmake ../
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found CUDA: /usr/local/cuda-7.5 (found suitable version "7.5", minimum required is "6.5")
-- cuda found TRUE
-- Found Torch7 in /home/yyy/torch/install
-- Torch found /home/yyy/torch/install/share/cmake/torch
-- Building shared library with GPU support
-- Building Torch Bindings with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/yyy/warp-ctc-develop/build
yyy@node5:
/warp-ctc-develop/build$ make
[ 20%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o
[ 40%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_ctcpp_entrypoint.cu.o
/home/yyy/warp-ctc-develop/src/ctcpp_entrypoint.cu(1): error: this declaration has no storage class or type specifier

/home/yyy/warp-ctc-develop/src/ctcpp_entrypoint.cu(1): error: expected a ";"

2 errors detected in the compilation of "/tmp/tmpxft_00000e27_00000000-16_ctcpp_entrypoint.compute_52.cpp1.ii".
CMake Error at warpctc_generated_ctcpp_entrypoint.cu.o.cmake:264 (message):
Error generating file
/home/yyy/warp-ctc-develop/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_ctcpp_entrypoint.cu.o

make[2]: *** [CMakeFiles/warpctc.dir/src/./warpctc_generated_ctcpp_entrypoint.cu.o] Error 1
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
make: *** [all] Error 2

@06221098
Copy link

06221098 commented Apr 8, 2017

@ChWick Sorry to bother you. I have solved my problems. Thank you very much.

@shengyudingli
Copy link

@ChWick Excuse me, can you give a simple usage for CTCGreedyDecoderLayer? I use this layer on the top of a net, but I get a strange output, I suppose I used it in a wrong way......

@ChWick
Copy link
Author

ChWick commented Apr 13, 2017

@shengyudingli
From tc_decoder_layer.hpp:

// probabilities (T x N x C),
// sequence_indicators (T x N),
// target_sequences (T X N) [optional]
// if a target_sequence is provided, an additional accuracy top blob is
// required
virtual inline int MinBottomBlobs() const { return 2; }
virtual inline int MaxBottomBlobs() const { return 3; }

// sequences (terminated with negative numbers),
// output scores [optional if 2 top blobs and bottom blobs = 2]
// accuracy [optional, if target_sequences as bottom blob = 3]
virtual inline int MinTopBlobs() const { return 1; }
virtual inline int MaxTopBlobs() const { return 3; }

Provide the output of you last InnerProductLayer as first blob (probabilities), the sequence indicators for the LTSM Layers as the second blob (for computing the input sequence lengths). Optionally you can provide a target_sequence blob as 3rd bottom blob for computing the scores/accuracy.
The greedy decoding algorithmus equates to the argmax of the sequence and removing blanks or repeated labels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants