I think i have fixed the ctcloss nan problem!
Now!
Please pull the latest code from master.
Please update the pytorch to >= v1.2.0
Enjoy it!
PS: Once there is ctclossnan, please
- Change the
batchSize
to smaller (eg: 8, 16, 32)- Change the
lr
to smaller (eg: 0.00001, 0.0001)
- CentOS7
- Python3.6.5
- torch==1.2.0
- torchvision==0.4.0
- Tesla P40 - Nvidia
-
Run demo
python demo.py -m path/to/model -i data/demo.jpg
-
Variable length
It support variable length.
-
Change CTCLoss from warp-ctc to torch.nn.CTCLoss
As we know, warp-ctc need to compile and it seems that it only support PyTorch 0.4. But PyTorch support CTCLoss itself, so i change the loss function to
torch.nn.CTCLoss
. -
Solved PyTorch CTCLoss become
nan
after several epochJust don't know why, but when i train the net, the loss always become
nan
after several epoch.I add a param
dealwith_lossnan
toparams.py
. If set it toTrue
, the net will autocheck and replace allnan/inf
in gradients to zero. -
DataParallel
I add a param
multi_gpu
toparams.py
. If you want to use multi gpu to train your net, please set it toTrue
and set the paramngpu
to a proper number.
The data-loader expects the IAM dataset (or any other dataset that is compatible with it) in the data/ directory. Follow these instructions to get the dataset:
-
Register for free at this website.
-
Download words/words.tgz.
-
Download ascii/words.txt.
-
Put words.txt into the data/ directory.
-
Create the directory data/words/.
-
Put the content (directories a01, a02, ...) of words.tgz into data/words/.
-
Go to data/ and run python checkDirs.py for a rough check if everything is ok.
Parameters and alphabets can't always be the same in different situation.
-
Change parameters
Your can see the
params.py
in detail. -
Change alphabets
Please put all the alphabets appeared in your labels to
alphabets.py
, or the program will throw error during training process.
Run train.py
by
python train.py --trainroot path/to/train/dataset --valroot path/to/val/dataset