Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paddle v2 CTC error在训练日志中不显示 #3802

Closed
xieshufu opened this issue Sep 1, 2017 · 8 comments · Fixed by #3844
Closed

paddle v2 CTC error在训练日志中不显示 #3802

xieshufu opened this issue Sep 1, 2017 · 8 comments · Fixed by #3844
Labels
User 用于标记用户问题

Comments

@xieshufu
Copy link

xieshufu commented Sep 1, 2017

配置了一个网络,在PADDLE v2下运行,在增加了ctc_error度量后,但在日志里不显示error信息:

ctc_eval = paddle.evaluator.ctc_error(input=output, label=lbl)
     trainer = paddle.trainer.SGD(
         cost=cost,
         parameters=parameters,
         update_equation=momentum_optimizer,
         extra_layers=ctc_eval)

训练日志的显示信息如下:

 13 shuffle batch list len: 30208
 14 
 15 Pass 0, Batch 0, Cost 64.853592, {}
 16 .........
 17 Pass 0, Batch 10, Cost 55.328705, {}
 18 .........
 19 Pass 0, Batch 20, Cost 54.045830, {}
 20 .........
 21 Pass 0, Batch 30, Cost 63.349789, {}
 22 .........
 23 Pass 0, Batch 40, Cost 28.554266, {}
 24 .........
 25 Pass 0, Batch 50, Cost 27.816626, {}
 26 .........
 27 Pass 0, Batch 60, Cost 22.405493, {}

烦请PADDLE同学进行支持!
希望能够显示的信息类似这样的, 方便对模型的精度做判断:

I0901 05:03:12.769793  5485 TrainerInternal.cpp:165]  Batch=312 samples=9984 AvgCost=1.06051 CurrentCost=1.06051 Eval: error=0.0436963  deletions error=0.00442531  insertions error=0.00216684  substitutions error=0.0371042  sequences error=0.151542  CurrentEval: error=0.0436963  deletions error=0.00442531  insertions error=0.00216684  substitutions error=0.0371042  sequences error=0.151542
@wanghaoshuang
Copy link
Contributor

@xieshufu 你好,麻烦贴一下完整的配置?

@xieshufu
Copy link
Author

xieshufu commented Sep 1, 2017

import sys
import paddle.v2 as paddle
from ocr_8conv import ocr_8conv_net
from ocr_4conv import ocr_4conv_net
#from ocr_reader import train_reader, test_reader
from ocr_data import DataGenerator

def main():
    datadim = 48 * 48 * 1
    classdim = 21501

    # PaddlePaddle init
    paddle.init(use_gpu=True, trainer_count=1)

    image = paddle.layer.data(
        name="image", type=paddle.data_type.dense_vector(datadim))

    # Add neural network config
    # option 1. resnet
    # net = resnet_cifar10(image, depth=32)
    # option 2. vgg
    output = ocr_4conv_net(image, classdim+1)

    lbl = paddle.layer.data(
        name="label", type=paddle.data_type.integer_value_sequence(classdim))
    cost = paddle.layer.warp_ctc(input=output, 
        label=lbl, 
        size=classdim+1,
        blank=classdim,
        norm_by_times=True)
    ctc_eval = paddle.evaluator.ctc_error(input=output, label=lbl)

    # Create parameters
    model_path = ""
    if model_path == "":
        parameters = paddle.parameters.create(cost)
    else:
        parameters = paddle.parameters.Parameters.from_tar(gzip.open(model_path))
    
    train_list_path = "./train_3W.list"
    test_list_path = "./test_image.list"
    train_generator = DataGenerator(file_list_path=train_list_path)
    test_generator = DataGenerator(file_list_path=test_list_path)
    train_batch_reader = train_generator.batch_train_reader_creator(batch_size=32)
    test_batch_reader = test_generator.batch_test_reader_creator(batch_size=1)

    # Create optimizer
    momentum_optimizer = paddle.optimizer.Momentum(
        momentum=0.9,
        learning_rate=0.001)

    # End batch and end pass event handler
    def event_handler(event):
        if isinstance(event, paddle.event.EndIteration):
            if event.batch_id % 100 == 0:
                print "\nPass %d, Batch %d, Cost %f, %s" % (
                    event.pass_id, event.batch_id, event.cost, event.metrics)
            else:
                sys.stdout.write('.')
                sys.stdout.flush()
            
            if event.batch_id % 100 == 0:
                result = trainer.test(
                    reader=test_batch_reader,
                    feeding={'image': 0,
                             'label': 1})
                print "\nTest with Pass %d_%d, %s" % (event.pass_id, event.batch_id, result.metrics)
        if isinstance(event, paddle.event.EndPass):
            # save parameters
            with open('./result/params_pass_%d.tar' % event.pass_id, 'w') as f:
                parameters.to_tar(f)

            result = trainer.test(
                reader=test_batch_reader,
                feeding={'image': 0,
                         'label': 1})
            print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)

    # Create trainer
    trainer = paddle.trainer.SGD(
        cost=cost, 
        parameters=parameters, 
        update_equation=momentum_optimizer,
        extra_layers=ctc_eval)
    trainer.train(
        reader=train_batch_reader,
        num_passes=200,
        event_handler=event_handler,
        feeding={'image': 0,'label': 1})

if __name__ == '__main__':
    main()

@lcy-seso lcy-seso added the User 用于标记用户问题 label Sep 4, 2017
@lcy-seso
Copy link
Contributor

lcy-seso commented Sep 4, 2017

CTC error evaluator 在v2 下面确实无法输出,这个问题应该被fix 一下。

@lcy-seso
Copy link
Contributor

lcy-seso commented Sep 4, 2017

CTC evaluator 遇到的问题和 CRF evaluator 一样,可以按照和这个PR一样的原理fix一下
#2165

@xieshufu
Copy link
Author

旧版本PADDLE和paddle v2在训练集上的信息输出不对应, 旧版本的信息会输出两部分:一部分是所过的所有样本的度量,另一部分是当前这个period里所过样本的度量; paddle v2的信息里只有一部分,这部分是和旧版本PADDLE日志里的哪部分对应?
旧版本PADDLE日志的输出信息如下:
I0904 02:54:10.453464 2406 TrainerInternal.cpp:165] Batch=114816 samples=3674112 AvgCost=8.78967 CurrentCost=2.87923 Eval: error=0.278402 deletions error=0.0934485 insertions error=0.00498174 substitutions error=0.179751 sequences error=0.542304 CurrentEval: error=0.119589 deletions error=0.0131632 insertions error=0.00411695 substitutions error=0.102309 sequences error=0.350561

paddle v2的日志输出如下:
Pass 0, Batch 114999, Cost 3.997641, {'ctc_error_evaluator_0.error': 0.2395833432674408, 'ctc_error_evaluator_0.insertion_error': 0.0, 'ctc_error_evaluator_0.substitution_error': 0.1015624925494194, 'ctc_error_evaluator_0.sequence_error': 0.4375, 'ctc_error_evaluator_0.deletion_error': 0.1380208283662796}

@xieshufu
Copy link
Author

paddle v2的训练错误率,会出现比较大的跳动,见下表:

pass train_seq_err test_seq_err
Test with Pass 0_9999 1 1
Test with Pass 0_19999 0.8438 0.9752
Test with Pass 0_29999 0.5938 0.8947
Test with Pass 0_39999 0.75 0.8669
Test with Pass 0_49999 0.75 0.8272
Test with Pass 0_59999 0.4062 0.8213
Test with Pass 0_69999 0.4375 0.799
Test with Pass 0_79999 0.375 0.806
Test with Pass 0_89999 0.625 0.7855
Test with Pass 0_99999 0.5312 0.8168
Test with Pass 0_109999 0.625 0.766
Test with Pass 0 0.4375 0.7573
Test with Pass 1_9999 0.2812 0.7943
Test with Pass 1_19999 0.625 0.7763
Test with Pass 1_29999 0.5312 0.7606
Test with Pass 1_39999 0.5 0.754
Test with Pass 1_49999 0.2812 0.7521
Test with Pass 1_59999 0.3438 0.7527
Test with Pass 1_69999 0.625 0.7433
Test with Pass 1_79999 0.375 0.7467
Test with Pass 1_89999 0.5 0.7315
Test with Pass 1_99999 0.3438 0.7284
Test with Pass 1_109999 0.125 0.7343
Test with Pass 1 0.3125 0.7405
Test with Pass 2_9999 0.25 0.7442
Test with Pass 2_19999 0.5625 0.7333
Test with Pass 2_29999 0.1875 0.7231
Test with Pass 2_39999 0.4062 0.7237
Test with Pass 2_49999 0.6875 0.7209
Test with Pass 2_59999 0.5938 0.729
Test with Pass 2_69999 0.2188 0.7253
Test with Pass 2_79999 0.2188 0.725
Test with Pass 2_89999 0.3438 0.7185
Test with Pass 2_99999 0.5 0.7114
Test with Pass 2_109999 0.9688 0.7185
Test with Pass 2 0.125 0.7176

旧版本PADDLE的训练错误率则不会如此显著:

pass-00000-001 0.850588 0.999135
pass-00000-002 0.674114 0.790111
pass-00000-003 0.582831 0.790111
pass-00000 0.542304 0.735162
pass-00001-001 0.335244 0.710793
pass-00001-002 0.329335 0.710793
pass-00001-003 0.320777 0.694049
pass-00001 0.316103 0.683987
pass-00002-001 0.28279 0.683987
pass-00002-002 0.280995 0.672746
pass-00002-003 0.278213 0.669916
pass-00002 0.276203 0.663234
pass-00003-001 0.259311 0.656788
pass-00003-002 0.25923 0.656788
pass-00003-003 0.257976 0.655137
pass-00003 0.257175 0.64987
pass-00004-001 0.247375 0.64987
pass-00004-002 0.247382 0.647276
pass-00004-003 0.245491 0.641852
pass-00004 0.2447 0.641852
pass-00005-001 0.23903 0.638079
pass-00005-002 0.237361 0.636978
pass-00005-003 0.237547 0.634541
pass-00005 0.237378 0.630061
pass-00006-001 0.228743 0.630061
pass-00006-002 0.233057 0.628567
pass-00006-003 0.232819 0.625894
pass-00006 0.231984 0.625894
pass-00007-001 0.228352 0.62275
pass-00007-002 0.226918 0.621256
pass-00007-003 0.225997 0.621256
pass-00007 0.225452 0.618976
pass-00008-001 0.219648 0.618033
pass-00008-002 0.221626 0.618819
pass-00008-003 0.221625 0.616225
pass-00008 0.221795 0.616225
pass-00009-001 0.222509 0.616147
pass-00009-002 0.220557 0.614653
pass-00009-003 0.225 0.614653
pass-00009 0.224375 0.61371

@luotao1
Copy link
Contributor

luotao1 commented Sep 21, 2017

paddle v2打出来的信息,是当前这个period里所过样本的度量,历史所过的所有样本的度量需要用户自己算。

@xieshufu
Copy link
Author

xieshufu commented Sep 21, 2017

因为是序列识别模型的评测,涉及到了串与串之间的序列错误率、插入、删除、替换等信息,这个在外面不太好计算吧。
I0921 13:07:15.047029 9633 TrainerInternal.cpp:165] Batch=1872 samples=59904 AvgCost=38.757 CurrentCost=36.1531 Eval: error=0.999983 deletions error=0.999413 insertions error=0 substitutions error=0.000570641 sequences error=1 CurrentEval: error=1 deletions error=1 insertions error=0 substitutions error=0 sequences error=1
另外,CTC的cost是在当前所过样本集合上的cost,这个与旧版本的CurrentCost对应?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User 用于标记用户问题
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants