Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OCR]paddle预测,输出output层的信息时发现日志有丢失 #2421

Closed
xieshufu opened this issue Jun 8, 2017 · 3 comments
Closed
Assignees

Comments

@xieshufu
Copy link

xieshufu commented Jun 8, 2017

想利用PADDLE的训练程序对训练数据进行预测,为加快预测速度,对训练数据按照图片的宽度进行排序,batch_size=32, 然后在network.conf里面增加对输出层的信息输出:

Evaluator(name = "output_printer",
      type = "max_id_printer",
      inputs = ["output"])

在程序运行完成后,发现log里面的输出信息有丢失。不清楚是什么原因。
从日志里面拷贝出一部分数据,如下(25967264行信息未打印完全):

25967255 88 : 16.8427,
25967256 95 : 21.5762,
25967257 95 : 18.9083,
25967258 78 : 17.2552,
25967259 95 : 20.2771,
25967260 95 : 18.9346,
25967261 84 : 18.7365,
25967262 95 : 19.8724,
25967263 94 : 17.5951,
25967264 9
25967265 I0607 21:51:35.054631 24851 Trainer.cpp:706]  Pass=0 samples=700480
@xieshufu
Copy link
Author

xieshufu commented Aug 28, 2017

现在想用PADDLE输出某一层的信息,发现也有类似的问题(batch_size=1进行预测):
Evaluator(name = "fc_printer",
type = "value_printer",
inputs = ["gated1", "gated2"])
发现输出的某一行信息会有丢失:
6.04295 0.858446 3.42567 7.06977 6.31701 1.5272 32.2323 0.809208 7.33613 6.54205 2.34425 0 12.7624 9.07614 0.953456 0.00835749 1.61342 0.048701 0.921395 15.8897 0.00332694 3.28875 6.22427e-07 3.82872 7.13163 0.119832 7.14483e-30 9 .76204 1.41649 2.89394e-06 0.0507187 1.99983 8.69143e-07 0.00102601 14.0381 0.000560347 0.00788421 20.6291 8.71669 0.00575366 1.7155 1.48578 9.28934 0.00482429 0.447377 9.94948 3.67682 7.47722 4.06594 0.899231 3.31875 1.23859 26.8 689 12.7628 18.4871 3.00375 3.49557 0.125995 0.140014 0.0464331 7.06036 6.95269 0 0.139714 1.84226 0.141156 0 0.017 1961 0.460824 2.06683 0.00600001 2.72656e-17 0.195686 2.00234 16.9713 1.59295 0.0393303 0.677337 6.14424 4.81041e-1 8 0.355956 5.64359 1.80939 18.7533 1.47942 16.
最后面的16.,信息是有丢失的。
另外, 由于网络里面用到了RNN层和CTC层,想输出softmax层之前的特征,发现序列的时刻数目和实际输出特征的行数目也不对应。
//这里显示的特征时刻数目是23
I0818 18:19:17.711738 14799 Evaluator.cpp:909] layer=gated1 sequence pos vector:
[0 23 ]
而实际获得的特征行数没有23行,只有19行。
67 9.28149 0 3.9894 0 0 2.62564 0 0.973578 4.56105 9.06557 0 0 0.944749 2.56249
... ....
85 6.04295 0.858446 3.42567 7.06977 6.31701 1.5272 32.2323 0.809208 7.33613 6.54205

@xieshufu
Copy link
Author

@qingqing01 这个是不是PADDLE代码里面在输出log信息的时候,采用了这样的代码:
LOG(INFO) << "layer=" << name << " value matrix:\n" << os.str();
如果输出字符串的长度过长,它会进行截断?
我尝试着在里面将信息通过文件的方式来输出,发现存储信息不会丢失。
901 FILE *file_log = fopen("./data.txt", "at");
902 fprintf(file_log, "layer=%s value matrix:\n%s\n", name.data(), os_str.data());
903 fclose(file_log);

@luotao1
Copy link
Contributor

luotao1 commented Sep 6, 2017

#3810 已经修复了。

@luotao1 luotao1 closed this as completed Sep 6, 2017
heavengate pushed a commit to heavengate/Paddle that referenced this issue Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants